0
votes

Consider a data set train:

    z  a  
    1  1  
    0  2  
    0  1
    1  3
    0  1
    1  2
    1  1
    0  3
    0  1
    1  3

with a binary outcome variable z and a categorical predictor a with three levels: 1,2,3.

Now consider a data set test:

   z  a
      1
      1
      2
      1
      2
      2
      1

When I run the following code:

library(randomForest)
set.seed(825)
RFfit1 <- randomForest(z~a, data=train, importance=TRUE, ntree=2000)
RFprediction1 <- predict(RFfit1, test)

I get the following error message:

Error in predict.randomForest(RFfit1, test1) : 
  Type of predictors in new data do not match that of the training data.

I am assuming this is because the variable a in the test data set does not have three levels. How would I fix this?

1

1 Answers

0
votes

You must assign it the same levels as train

 test$a <- factor(test$a, levels=levels(train$a))