Fill data based on random forest object predicted response

Question

Using randomForest, I get an RF object.
E.g. forest <- randomForest(as.formula(generic),data=train, mtry=2))

Using predict I can predict the response on a test dataset.
The response is either A,B or C.

prediction <- predict(forest, newdata=test, type='class')
mytable <- table(test$class_w,prediction)
sum(mytable[row(mytable) != col(mytable)]) / sum(mytable)#show error

Calling the forest object I get the confusion matrix:

     A     B    C     class.error
A   498    79   170   0.3333333
B   115    353  237   0.4992908
C   96     99   967   0.1678141

E.g test dataset:

id |class_w| valueA | valueB |
1  |  C    |  0.254 |  0.334 |
2  |  A    |  0.654 |  0.334 |
3  |  A    |  0.554 |  0.314 |
4  |  B    |  0.454 |  0.224 |
5  |  C    |  0.354 |  0.332 |
6  |  C    |  0.264 |  0.114 |
7  |  C    |  0.264 |  0.664 |

I would like to know if I can create a new dataset with 2 columns the id of the previous dataset and the predicted response (the RF gave). E.g.

row id of test dataset  |  predicted response
1                       |  A  #failed
2                       |  B  #failed
3                       |  B  #failed
4                       |  B  #TRUE!

Thanks in advance for your help.

Hi joran. I mean, if it is possible, to know the predicted value for every row in the dataset and accomplish that by using a function that would mark the response like df$RF_Prediction <- predicted_value — Panos Kal.
I still don't understand. Are you just trying to do test$RF_prediction <- prediction? They should be in the same order. (And don't you mean type = "response"?) — joran
I still don't really understand your confusion. This is no different than creating any other data frame: data.frame(id = test$id,response = prediction). Or you could add it as a column to test as I noted above. — joran
Joran, I think this is what I was asking. I am just starting using R, I didn't know how to do it. Post it as an answer so I can accept it. Thanks. — Panos Kal.

joran joran · Accepted Answer · 2013-04-22T21:40:20

I think you may simply be looking to create a new data frame:

data.frame(id = test$id,response = prediction)

That assumes that id is in fact a column in test, rather than the row names. If they are rownames, then you'd want to do:

data.frame(id = rownames(id),response = prediction)

Fill data based on random forest object predicted response

2 Answers