3
votes

I would like to use GLMNET to fit a binomial logistic regression model. I could use either caret or the glmnet-package directly. Lets take the data(BinomialExample) as an example to execute the following code where I´ve implemented both:

#rm(list = ls(all.names = TRUE))

library(glmnet)    
library(caret)
data(BinomialExample)

y[y==0] = "low"
y[y==1] = "high"
y <- as.factor(y)

#split data in training & validation set

set.seed(1)
splitSample <- createDataPartition(y, p = 0.8, list = FALSE)
training_expression <- x[splitSample,]
training_phenotype <- y[splitSample]
validation_expression <- x[-splitSample,]
validation_phenotype <- y[-splitSample]

#####################
##GLMNET with CARET##
#####################
eGrid <- expand.grid(.alpha=seq(0.1,0.9, by=0.1),.lambda=seq(0,1,by=0.01))
Control <- trainControl(verboseIter=TRUE, classProbs=TRUE, summaryFunction=twoClassSummary, method="cv") 

set.seed(1)
netFit <- train(x = training_expression, y = training_phenotype,method = "glmnet", metric = "ROC", tuneGrid=eGrid,trControl = Control)
netFitPerf <- getTrainPerf(netFit) 
trainROC <- netFitPerf[,1]
trainSens <- netFitPerf[,2]
trainSpec <- netFitPerf[,3] 
trainAlpha <- netFit$bestTune[,1]
trainLambda <- netFit$bestTune[,2]
print(sprintf("ROC: %s Sens: %s Spec: %s Alpha: %s Lambda: %s", round(trainROC,2), round(trainSens,2), round(trainSpec,2), round(trainAlpha,2),round(trainLambda,2))) 

predict_validation <- predict(netFit, newdata = validation_expression)
confusionMatrix(predict_validation,validation_phenotype)

######################
#GLMNET without CARET#
######################
set.seed(1)
elasticnet <- cv.glmnet(training_expression, training_phenotype, family = "binomial", type.measure = "class", nfolds=10, alpha=0.5, nlambda = 100) 
plot(elasticnet)
predict_validation <- predict(elasticnet, newx = validation_expression, s = c(elasticnet$lambda.min), type = "class")
confusionMatrix(predict_validation,validation_phenotype)

As you can see if I use the caret packet I can easily print the ROC, Sensitivity and Specificity of the model. However I was not able to find a similar way to print ROC, Sens, Spec if I use glmnet directly without CARET - is there a similar way to get these metrics?

Thanks for your help!

2
Although it is possible, why do you want to do this if caret accomplishes your goal? Is this just curiosity?cdeterman
Well I could use caret in this particular case but in the past I´ve always used glmnet directly (because there are situtations in which caret is not useful (e.g. when fitting a Cox regression model)). So I would still be curious how I could calculate the mentioned metrics also without using caret.user86533

2 Answers

4
votes

You can get the values you want from various objects produced by your glmnet workflow. For example, if you do

cm = confusionMatrix(predict_validation,validation_phenotype)

then cm$byClass includes Specificity and Sensitivity:

cm$byClass
     Sensitivity          Specificity       Pos Pred Value       Neg Pred Value           Prevalence 
       0.8181818            1.0000000            1.0000000            0.8000000            0.5789474 
  Detection Rate Detection Prevalence    Balanced Accuracy 
       0.4736842            0.4736842            0.9090909 

Likewise, you can get Lambda from elasticnet$lambda.min and alpha from gsub(".*alpha = ([0-9]\\.[0-9]*).*","\\1",deparse(elasticnet$glmnet.fit$call)[2]) (although there may be a better way than that monstrous piece of code). Actually, since the alpha value is an input to the function, you don't even need to extract it. However, if you cross-validate on alpha in addition to lambda you'd need to use a loop to try out multiple alpha values and then you'd need some way to extract the alpha value of the best model. If you decide to include alpha in the cross-validation, be sure to read the Details section of cv.glmnet.

For the AUC of the ROC curve, cv.glmnet will give you that, but you'd need to use type.measure="auc" instead of type.measure="class", which would change how the best model is selected. Also, with this particular data sample, you need to use fewer CV folds, but that might not be an issue with your real data. For example:

elasticnet <- cv.glmnet(training_expression, training_phenotype, family = "binomial", 
                        type.measure = "auc", nfolds=5, alpha=0.5, nlambda = 100) 

Then, to get the AUC:

elasticnet$cvm[which(elasticnet$lambda==elasticnet$lambda.min)] 

or

max(elasticnet$cvm)

If you want to calculate the AUC without using AUC to select the best model, you might have to calculate that yourself or use a pre-existing function for this, such as auc from the pROC package.

0
votes

You can easily see the AUC and other measures like MSE using the assess.glmnet function directly.