0
votes

I have some question, when use random forest to X_train, y_train, X_test, y_test.

When train the data, I use like this:

rf_train <- randomForest(y = y_train, x = X_train, ntree = 1000)

but, I have a question. Which one is correct to predict new data. :

1.

randomForest(y = y_test, x = X_test, ntree = 1000)

2.

predict(rf_train, X_test)

please tell me which one is right.

1
Have you checked ?predict.randomForestakrun
@akrun I did :-) ... I did a bunch of work with R and the randomForest package a while back. I still actually remember some of it ^ ^Tim Biegeleisen

1 Answers

2
votes

In a situation like this, you can use a combination of the function signature along with your intuition (and the documentation) to answer your own question. The predict function is what you should be using to apply the random forest model to new test data. As you are calling it, predict takes as the first parameter the model output from the call to randomForest. And it takes as the second parameter a data frame or matrix containing the test data, one row for each test case. As the documentation mentions, the output, at least in the case of a random forest model built using regression, is a vector of responses, one response for each test case/row of the input matrix/data frame of test cases.