Random Forest
Hello,
Any help on the following issue would be greatly appreciated:
I have a dataset with several satellite images that I use through Random Forest ML regression in JASP to estimate wheat yield (target, measured). The dataset concerns three different years with 18926 samples (pixels). Each satellite image has 12 spectral bands and each band from each image is treated as an independent variable (predictor).
When I make the regression with all data (default data split settings) I get a very nice prediction with R2=0.94!! But when I use the test set indicator to train the model with the data from two years and make the prediction for the third year the result is very disappointing (R2<0.5). This is the case for any combination of two years for training and the third one for validation.
Screenshots from the two analyses may be found below. In this example only one image has been used for simplicity reasons, but the results are similar when all images are used.
Variables range per year:
Default data split (all 3 years for training):
Data split with 2 years for training and the third year for prediction (using test set indicator):
Comments
Dear Akypar,
I'll attend our expert to this. In general, predictive performance is always less impressive than goodness-of-fit (a Danish proverb goes "prediction is difficult, especially about the future" :-)).
E.J.