When I'm running random forest model over my test data I'm getting different results for the same data set + model.
Here are the results where you can see the difference over the first column:
> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR)
FALSE TRUE
FALSE 14 7
TRUE 13 66
> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR)
FALSE TRUE
FALSE 15 7
TRUE 12 66
Although the difference is very small, I'm trying to understand what caused that. I'm guessing that predict
has "flexible" classification threshold, although I couldn't find that in the documentation; Am I right?
Thank you in advance
randomForest
a bit closer. It explains perfectly why this is documented behaviour. Your randomForest is a collection of trees, and each time you run the model you'll end up with a slightly different set of trees. That has nothing to do with the predict function, that is simply how random forests work. Next to that, questions about statistical techniques belong on stats.stackexchange.com , not on stackoverflow. – Joris Meys