0
votes

I grew a random forest model (using cforest of the package 'party') on a dataset containing approximately 1000 observations of 7 variables. The response is binary (say outcome A and outcome B) and the 6 predictors are all categorical. My problem is that I would like to get the probability of each of the 1000 outcomes like in a logistic regression model. In the latter case, we can use predict(yourmodel, type="response") to get the probability of each outcome, in which case outcome A is favoured when p<0.5 and outcome B is favoured when p>=0.5.

It appears that when applying predict on a random forest object, I only get the predicted outcome (i.e. A or B) for each observation. Is there a workaround to retrieve the probability of the predicted response?

I thank you very much for your help!

C.B.

3

3 Answers

3
votes

From the documentation:

type one of response, prob. or votes, indicating the type of output: predicted values,matrix of class probabilities, or matrix of vote counts. class is allowed, but automatically converted to "response", for backward compatibility.

So try this:

probs <- predict(FIT, newdata, type="probs")
1
votes

Now I know ho to generate and extract the p-values as if looking at predicted responses of a logistic regression:

1) Generate the predicted probabilities of both outcomes

probs <- predict(FIT, newdata, type="prob") # thanks to thc

2) Retrieve the probability of the second outcome for each row, i.e. the probability of the second level in a logistic regression:

> predict.prob<-unlist(lapply(probs, '[[', 2))

I hope this will help other readers interested in how we can extract probabilities from lists.

I would like to thank both thc and tylers for their suggestions and help!

C.B.

-1
votes

I use the h2o randomforest package to train my models. When doing prediction, the every observation is returned a probability value (for its confidence by the model)

https://cran.r-project.org/web/packages/h2o/h2o.pdf

Do take a look