0
votes

I am trying to perform a randomforest survival analysis according to the RANDOMFORESTSRC vignette in R. I have a data frame containing 59 variables - where 14 of them are numeric and the rest are factors. 2 of the numeric ones are TIME (days till death) and DIED (0/1 dead or not). I'm running into 2 problems: 1.

trainrfsrc<- rfsrc(Surv(TIME, DIED) ~ .,data = train, nsplit = 10, na.action = "na.impute") trainrfsrc

  Sample size: 3228
                Number of deaths: 825
                 Number of trees: 1000
       Forest terminal node size: 3
   Average no. of terminal nodes: 525.427

No. of variables tried at each split: 8 Total no. of variables: 57 Analysis: RSF Family: surv Splitting rule: logrank random Number of random split points: 10 Error rate: 17.07%

works fine, however exploring the error rate such as:

plot(gg_error(trainrfsrc))+ coord_cartesian(y = c(.09,.31)) returns: geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?

or a<-(gg_error(trainrfsrc))

a error ntree 1 NA 1 2 NA 2 3 NA 3 4 NA 4 5 NA 5 6 NA 6 7 NA 7 8 NA 8 9 NA 9 10 NA 10 for all 1000 trees.how come there's no error rate for each number of trees tried?

  1. the second problem is when trying to explore the most important variables using VIMP such as:

plot(gg_vimp(trainrfsrc)) + theme(legend.position = c(.8,.2))+ labs(fill = "VIMP > 0")

it returns: In gg_vimp.rfsrc(trainrfsrc) : rfsrc object does not contain VIMP information. Calculating...

Any ideas? Thanks

1
help anyone????XPeriment

1 Answers

-1
votes

I came across the same problem and found the reason in the manual of ggRandomForests:

"arguments passed to the vimp.rfsrc function if the rfsrc object does not contain importance information."

You can try to fix the code like this:

plot(gg_vimp(vimp.rfsrc(trainrfsrc))) 

just add vimp.rfsrc before trainrfsrc. Hope it will work.