I am trying to perform a randomforest survival analysis according to the RANDOMFORESTSRC vignette in R. I have a data frame containing 59 variables - where 14 of them are numeric and the rest are factors. 2 of the numeric ones are TIME (days till death) and DIED (0/1 dead or not). I'm running into 2 problems: 1.
trainrfsrc<- rfsrc(Surv(TIME, DIED) ~ .,data = train, nsplit = 10, na.action = "na.impute") trainrfsrc
Sample size: 3228
Number of deaths: 825
Number of trees: 1000
Forest terminal node size: 3
Average no. of terminal nodes: 525.427
No. of variables tried at each split: 8 Total no. of variables: 57 Analysis: RSF Family: surv Splitting rule: logrank random Number of random split points: 10 Error rate: 17.07%
works fine, however exploring the error rate such as:
plot(gg_error(trainrfsrc))+ coord_cartesian(y = c(.09,.31)) returns: geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
or a<-(gg_error(trainrfsrc))
a error ntree 1 NA 1 2 NA 2 3 NA 3 4 NA 4 5 NA 5 6 NA 6 7 NA 7 8 NA 8 9 NA 9 10 NA 10 for all 1000 trees.how come there's no error rate for each number of trees tried?
- the second problem is when trying to explore the most important variables using VIMP such as:
plot(gg_vimp(trainrfsrc)) + theme(legend.position = c(.8,.2))+ labs(fill = "VIMP > 0")
it returns: In gg_vimp.rfsrc(trainrfsrc) : rfsrc object does not contain VIMP information. Calculating...
Any ideas? Thanks