Problem with ML training
Hi everybody
I need to train an ML algorithm for classification. I followed the instructions provided at https://jasp-stats.org/2019/10/07/how-to-train-a-machine-learning-model-in-jasp-classification/ but I am unsure how to add "...a custom made indicator called testIndicator that represents 20% of the data that contains an equal proportion of [...]".
Do you have any suggestion on how to do this?
Thanks
Comments
You would nee to create this indicator outside of JASP. But this is typically only required if the classes are unbalanced in the training and/or test set. If the classes are relatively balanced then you can also let JASP sample a fixed % of data as training and test set via the Data Split Preferences. I would try that first and check the model performance. If the performance is bad (e.g., the model heavily prefers one class because of the training data) then it could be useful to create a custom test set indicator in R or Excel or something.
Thanks koenderks for your reply. My classes are unbalanced (80%-20%), so the model performance is terrible. If I understood well, the test indicator represents whether each case is included (1) or excluded (0) from the model, so I randomly chose 20% with an equal number of positive and negative cases from the whole cohort to train the algorithm.
Specifically, the test set indicator represents whether each case is included (1) or excluded (0) from the test set (not the model). The cases for which the test set indicator is 0 are used for training and those for which it is 1 are used for evaluating the model performance.