I am trying to use quantile regression forest function in R (quantregForest) which is built on Random Forest package. I am getting a type mismatch error that I can't quite figure why.
I train the model by using
qrf <- quantregForest(x = xtrain, y = ytrain)
which works without a problem, but when I try to test with new data like
quant.newdata <- predict(qrf, newdata= xtest)
it gives the following error:
Error in predict.quantregForest(qrf, newdata = xtest) :
Type of predictors in new data do not match types of the training data.
My training and testing data are coming from separate files (hence separate data frames) but having the same format. I have checked the classes of the predictors with
sapply(xtrain, class)
sapply(xtest, class)
Here is the output:
> sapply(xtrain, class)
pred1 pred2 pred3 pred4 pred5 pred6 pred7 pred8
"factor" "integer" "integer" "integer" "factor" "factor" "integer" "factor"
pred9 pred10 pred11 pred12
"factor" "factor" "factor" "factor"
> sapply(xtest, class)
pred1 pred2 pred3 pred4 pred5 pred6 pred7 pred8
"factor" "integer" "integer" "integer" "factor" "factor" "integer" "factor"
pred9 pred10 pred11 pred12
"factor" "factor" "factor" "factor"
They are exactly the same. I also checked for the "NA" values. Neither xtrain nor xtest has a NA value in it. Am I missing something trivial here?
Update I: running the prediction on the training data still gives the same error
> quant.newdata <- predict(qrf, newdata = xtrain)
Error in predict.quantregForest(qrf, newdata = xtrain) :
names of predictor variables do not match
Update II: I combined my training and test sets so that rows from 1 to 101 are the training data and the rest is the testing. I modified the example provided in (quantregForest) as:
data <- read.table("toy.txt", header = T)
n <- nrow(data)
indextrain <- 1:101
xtrain <- data[indextrain, 3:14]
xtest <- data[-indextrain, 3:14]
ytrain <- data[indextrain, 15]
ytest <- data[-indextrain, 15]
qrf <- quantregForest(x=xtrain, y=ytrain)
quant.newdata <- predict(qrf, newdata= xtest)
And it works! I'd appreciate if any one could explain why it works this way and not with the other way?
pred1
values that have different types doesn't seem like a great idea. Maybe change the factor one to be called `pred1.factor'? – Andy Clifton