Consider a data set train:
z a
1 1
0 2
0 1
1 3
0 1
1 2
1 1
0 3
0 1
1 3
with a binary outcome variable z and a categorical predictor a with three levels: 1,2,3.
Now consider a data set test:
z a
1
1
2
1
2
2
1
When I run the following code:
library(randomForest)
set.seed(825)
RFfit1 <- randomForest(z~a, data=train, importance=TRUE, ntree=2000)
RFprediction1 <- predict(RFfit1, test)
I get the following error message:
Error in predict.randomForest(RFfit1, test1) :
Type of predictors in new data do not match that of the training data.
I am assuming this is because the variable a in the test data set does not have three levels. How would I fix this?