How to predict on the Test data using randomforest when "prediction" column (is_promoted) is missing in the TEST data set given?
Here I have given two data sets: Train and Test, in Test data set I have to predict whether the employee will be promoted or not.
The Train data set has the is_promoted
column which has been used to build the model.
and I have used Test$is_promoted=NA
to add the is_promoted column in my Test data set so that I have equal dimensions during data preparation process.
But when I am using Random forest to predict the final values it shows those "NA" as missing value errors.
set.seed(123)
rf_m3=randomForest(is_promoted~.,data = FinalTest,ntree=150, nodesize=50, mtry=5)
rf_test_pred=predict(rf_m3, FinalTest, type="class")
Error code:
Error in na.fail.default(list(is_promoted = c(NA_integer_, NA_integer_, :
missing values in object
Now I can't remove "is_promoted" also as its my dependent variable.
So kindly suggest a way to handle this issue and the modification of the code required.
PS: New learner. First time trying random forest, so if possible please explain as much as possible.
is_promoted
is missing. You shouldn't create such a column when usingpredict
. – nicola