0
votes

This is the problem I am having. I hope someone can explain why

I have a large dataset I am using to predict a categorical value - L,M,H - in the original data.frame it is a factor.

The training set is large, so I do not have enough memory to train on it - so I took a sample of my training dataset and create a randomForest. Then I created a different random sample and created a second forest, .... They all have similar performance which was a concern

I found the combine function in randomForest and decided to use it to combine my models.

I then need to use the new model to score the train set to get an OOB estimate and then the same with my validation sample.

I am having a problem with the prediction on the test set.

I basically get a message saying "Error in eval(expr,envirmenclos) : object 'XXX' not found" where XXX is the variable name. But this makes no sense as the variables never changed names

I redid this a few times, in case my data got corrupted.

Any idea why am I getting this?

1
Impossible to say without a reproducible example. But in general, error messages don't lie. If R says that it can't find one of your variables, then it really truly can't find one of your variables.joran

1 Answers

0
votes

Without the data is hard to know but this is my hunch based on similar errors in the past- If you are sampling your data and running separate models, you may run into a problem with categorical variables where the factor levels in one model do not match the factor levels from another model. The way to potentially fix this is to specify the factor levels in the data frame (using the levels function) before you run the model.

Edit- one way to debut is to run two models on the same sample data combine them and try to apply the model and see if you get the same error..