0
votes

I am using rf model in R to predict a binary outcome 0 or 1. I have categorical variables (coded as numbers) in my input data which are coded as factor while training. I use factor() function in R to convert the variable as factor. So for every categorical variablex,my code is like this.

feature_x1=factor(feature_x1) # Convert the variable into factor in training data. 
#This variable takes 3 levels 0,1,2

This works perfectly fine while training the model. Let us assume my model object is rf_model. While running the model on new data which is just a vector of numbers. I first convert the number into factors for feature_x1

newdata=data.frame(1,2)
colnames(newdata)=c("feature_x1","feature_x2")
newdata$feature_x1=factor(newdata$feature_x1)
score=pred(rf_model,newdata,type="prob")

I am receiving the following error

Error in predict.randomForest(rf_model, newdata,type = "prob") : New factor levels not present in the training data

How to deal with this error, because in reality, after training the model we will always have to deal with data for which outcome is unknown which is a just a single record.

Please let me know if more clarity or code is required

1

1 Answers

2
votes

Try

newdata$feature_x1 <- factor(newdata$feature_x1, levels=levels(feature_x1))