factor name has new levels while using predict function in test data set

Question

I'm trying to solve the titanic data set from kaggle. I have done almost all the work on train data set train (891 obs of 12 variables) test (418 obs of 11 variables)

I have used decision trees (rpart method)

confusionMatrix(pred_train,train$Survived) Confusion Matrix and Statistics

              Reference
    Prediction   0   1
             0 549   0
             1   0 342

               Accuracy : 1                  
                 95% CI : (0.996, 1)         
    No Information Rate : 0.616              
    P-Value [Acc > NIR] : <0.0000000000000002

                  Kappa : 1                  
 Mcnemar's Test P-Value : NA                 

            Sensitivity : 1.000              
            Specificity : 1.000              
         Pos Pred Value : 1.000              
         Neg Pred Value : 1.000              
             Prevalence : 0.616              
         Detection Rate : 0.616              
   Detection Prevalence : 0.616              
      Balanced Accuracy : 1.000              

       'Positive' Class : 0

I use pred <- predict (fit ,test ,type = "class") I get

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object, : factor Name has new levels Abbott, Master. E...

how can I solve this problem as there is difference in observations of train and test data set (891 and 418) and I have already removed identifier(passengerId) from train data set

s.brunel s.brunel · Accepted Answer · 2017-06-13T14:18:47

before training you need to rbind test and train then use factor and extract "new" train and test with all factor levels

factor name has new levels while using predict function in test data set

1 Answers