0
votes

I'm new to Random Forests in R, and I'm trying to make a prediction. I have built a Random Forest model using the following code, which works fine

library(randomForest)
RF_model = randomForest(trainrows[,col_truth]~.
                    ,data = trainrows[,cols_to_use]
                    ,ntree=100
                    ,do.trace=T)

If I print out RF_model, I get the following output

Call:
 randomForest(formula = trainrows[, col_truth] ~ ., data = trainrows[,      cols_to_use], ntree = 100, do.trace = T) 
               Type of random forest: classification
                     Number of trees: 100
No. of variables tried at each split: 4

        OOB estimate of  error rate: 19.23%
Confusion matrix:
     0    1 class.error
0 7116 1640   0.1873001
1 1725 7015   0.1973684

Then, when I try and make a prediction with the model, I get the following error

> predict(RF_model)
Error in 1:dim(data)[1] : argument of length 0

I have tried supplying data to the predict method, but I get the same error. Does anyone know what's going on and how to fix it?

EDIT

In order to provide some more data, I have tried using Random Forests with the iris dataset.

rf = randomForest(iris[,1]~., data=iris[,c(1, 2)], ntree=100)
predict(rf)
Error in 1:dim(data)[1] : argument of length 0

This is not related to my data, but a problem with my version of R, I think. Any ideas?

1
Please include sample data to make your example reproducible. Feel free to use a built-in data set, but unless we can run the same code and get the same error, it's difficult to help.MrFlick
rf = randomForest(iris[,1]~., data=iris[,c(1, 2)], ntree=100) ; predict(rf) works fine, so this issue is probably specific to your dataset. Please include a reproducible example.josliber
If i had to guess, the problem is likely related to your formula specification, which follows none of the conventions of specifying formulas in R. Formulas contain names of columns. DO NOT mix subseting into your formulas. Ever.joran
I have just adjusted my question showing more dataJon

1 Answers

0
votes

When you use the predict function, you are trying to predict the outcome or labels for your test set.

rf_predict <- predict(RF_model, test_set)

You can create a confusion matrix to compare the accuracy of your random forest by using the table function

table(observed, rf_predict)

Note: The observed will be the correct labels for the test set