1
votes

I am a beginner, trying to do survival analysis using machine learning on the lung cancer dataset. I know how to do the survival analysis using the Cox proportional hazard model. Cox proportional hazard model provides us the hazard ratios, which are nothing but the exponential of the regression coefficients. I wonder if, we can do the same thing using machine learning. As a beginner, I am trying survivalsvm from the R language. Please see the link for this. I am using the inbuilt cancer data for doing survival analysis. Following is the R code, given at this link.

library(survival)
library(survivalsvm)

set.seed(123)
n <- nrow(veteran)
train.index <- sample(1:n, 0.7 * n, replace = FALSE)
test.index <- setdiff(1:n, train.index)
survsvm.reg <- survivalsvm(Surv(diagtime, status) ~ ., 
                            subset = train.index, data = veteran,
                            type = "regression", gamma.mu = 1,
                            opt.meth = "quadprog", kernel = "add_kernel")
print(survsvm.reg)
pred.survsvm.reg <- predict(object = survsvm.reg,
                             newdata = veteran, subset = test.index)
print(pred.survsvm.reg)

Can anyone help me to get the hazard ratios or survival curve for this dataset? Also, how to interpret the output of this function

1

1 Answers

0
votes

This question is kind of old now but I'm going to answer anyway because this is a difficult problem and I struggled with {survivalsvm} when I first used it.

So depending on the type argument you get different outputs. In your case type = "regression" means you are plotting Shivaswamy's (hope i spelt correctly) SVCR which predicts the time until an event takes place, so these are survival time predictions.

In order to convert this to a survival curve you have to make some assumptions about the shape of the survival distribution. So for example, let's say you think the survival time is Normally distributed with N(mu, sigma). Then you can use your predicted survival time as mu and either predict or make an assumption about sigma.

Below is an example using your code and my {distr6} package, which enables quick computation of many distributions and printing and plotting of functions:

library(survival)
library(survivalsvm)
set.seed(123)
n <- nrow(veteran)
train.index <- sample(1:n, 0.7 * n, replace = FALSE)
test.index <- setdiff(1:n, train.index)
survsvm.reg <- survivalsvm(Surv(diagtime, status) ~ ., 
                           subset = train.index, data = veteran,
                           type = "regression", gamma.mu = 1,
                           opt.meth = "quadprog", kernel = "add_kernel")
print(survsvm.reg)
pred.survsvm.reg <- predict(object = survsvm.reg,
                            newdata = veteran, subset = test.index)


# load distr6
library(distr6)

# create a vector of normal distributions each with
# mean as the predicted time and with variance 1
# `decorators = "ExoticStatistics"` adds survival function
v = VectorDistribution$new(distribution = "Normal",
                       params = data.frame(mean = as.numeric(pred.survsvm.reg$predicted)),
                       shared_params = list(var = 1),
                       decorators = "ExoticStatistics")
# survival function evaluated at times = 1:10
v$survival(1:10)
# plot survival function for first individual
plot(v[1], fun = "survival")
# plot hazard function for first individual
plot(v[1], fun = "hazard")