Using randomForest
, I get an RF object.
E.g. forest <- randomForest(as.formula(generic),data=train, mtry=2)
)
Using predict
I can predict the response on a test dataset.
The response is either A,B or C.
prediction <- predict(forest, newdata=test, type='class')
mytable <- table(test$class_w,prediction)
sum(mytable[row(mytable) != col(mytable)]) / sum(mytable)#show error
Calling the forest object I get the confusion matrix:
A B C class.error
A 498 79 170 0.3333333
B 115 353 237 0.4992908
C 96 99 967 0.1678141
E.g test dataset:
id |class_w| valueA | valueB |
1 | C | 0.254 | 0.334 |
2 | A | 0.654 | 0.334 |
3 | A | 0.554 | 0.314 |
4 | B | 0.454 | 0.224 |
5 | C | 0.354 | 0.332 |
6 | C | 0.264 | 0.114 |
7 | C | 0.264 | 0.664 |
I would like to know if I can create a new dataset with 2 columns the id of the previous dataset and the predicted response (the RF gave). E.g.
row id of test dataset | predicted response
1 | A #failed
2 | B #failed
3 | B #failed
4 | B #TRUE!
Thanks in advance for your help.
test$RF_prediction <- prediction
? They should be in the same order. (And don't you meantype = "response"
?) – jorandata.frame(id = test$id,response = prediction)
. Or you could add it as a column totest
as I noted above. – joran