0
votes

I'm trying to solve the titanic data set from kaggle. I have done almost all the work on train data set train (891 obs of 12 variables) test (418 obs of 11 variables)

I have used decision trees (rpart method)

confusionMatrix(pred_train,train$Survived) Confusion Matrix and Statistics

              Reference
    Prediction   0   1
             0 549   0
             1   0 342

               Accuracy : 1                  
                 95% CI : (0.996, 1)         
    No Information Rate : 0.616              
    P-Value [Acc > NIR] : <0.0000000000000002

                  Kappa : 1                  
 Mcnemar's Test P-Value : NA                 

            Sensitivity : 1.000              
            Specificity : 1.000              
         Pos Pred Value : 1.000              
         Neg Pred Value : 1.000              
             Prevalence : 0.616              
         Detection Rate : 0.616              
   Detection Prevalence : 0.616              
      Balanced Accuracy : 1.000              

       'Positive' Class : 0                  

I use pred <- predict (fit ,test ,type = "class") I get

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object, : factor Name has new levels Abbott, Master. E...

how can I solve this problem as there is difference in observations of train and test data set (891 and 418) and I have already removed identifier(passengerId) from train data set

1

1 Answers

0
votes

before training you need to rbind test and train then use factor and extract "new" train and test with all factor levels