cv.glmnet vs glmnet results; gauging explanatory power

Question

When estimating a lasso model via the glmnet package, I am wondering whether it is better to: (a) pull coefficients / predictions / deviance straight from the cv.fit object procured from cv.glmnet, or (b) use the minimum lambda from cv.glmnet to re-run glmnet and pull these objects from the glmnet process. (Please be patient -- I have a feeling that this is documented, but I'm seeing examples/tutorials of both online, and no solid logic for going one way or the other.)

That is, for coefficients, I can run (a):

cvfit = cv.glmnet(x=xtrain, y=ytrain, alpha=1, type.measure = "mse", nfolds = 20)
coef.cv <- coef(cvfit, s = "lambda.min")

Or I can afterwards run (b):

fit = glmnet(x=xtrain, y=ytrain, alpha=1, lambda=cvfit$lambda.min)
coef <- coef(fit, s = "lambda.min")

While these two processes select the same model variables, they do not produce identical coefficients. Similarly, I could predict via either of the following two processes:

prdct <- predict(fit,newx=xtest)
prdct.cv <- predict(cvfit, newx=xtest, s = "lambda.min")

And they predict similar but NOT identical vectors.

Last, I would have THOUGHT I could pull % deviance explained via either of the two methods:

percdev <- fit$dev.ratio
percdev.cv <- cvfit$glmnet.fit$dev.ratio[cvfit$cvm==mse.min.cereal]

But in fact, it is not possible to pull percdev.cv in this way, because if the lambda sequence used by cv.glmnet has less than 100 elements, the lengths of cvfit$glmnet.fit$dev.ratio and cvfit$cvm==mse.min.cereal don't match. So I'm not quite sure how to pull the minimum-lambda dev.ratio from cvfit$glmnet.fit.

So I guess I'm wondering which process is best, why, and how people normally pull the appropriate dev.ratio statistic. Thanks!

From ?glmnet, doc for lambda says: WARNING: use with care. Avoid supplying a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.. So I guess this partly answers your question about which to use. — acylam
Thanks. And yes, I understand that generally glmnet should be used with a (default or supplied) lambda sequence, but once such a sequence has been supplied to cv.glmnet, and an "optimal" lambda (lambda.1se or lambda.min) obtained, one would think that using that lambda would result in identical glmnet results as it did under cv.glmnet. Even if slower to calculate. Additionally, I do have a hunch that it's probably better to obtain coefficients and predictions from cv.glmnet, but I am not sure how to obtain dev.ratio from cv.glmnet. — Leah Bevis

StupidWolf StupidWolf · Accepted Answer · 2020-07-01T23:03:18

As pointed out in the comments, it has to do with the lambda sequence provided, because if you look under the source code of cv.glmnet, it calls glmnet:::cv.glmnet.raw which in the first few lines, run glmnet() on the defined lambda values.

So we can use an example below:

library(mlbench)
data(BostonHousing)
data = BostonHousing
data$chas=as.numeric(data$chas)

cvfit = cv.glmnet(x=as.matrix(data[,-14]),y=data[,14])
coef.cv <- coef(cvfit, s = "lambda.min")

fit = glmnet(x=as.matrix(data[,-14]), y=data[,14], alpha=1, lambda=cvfit$lambda.min)
coef <- coef(fit, s = "lambda.min")

head(cbind(coef.cv,coef))
6 x 2 sparse Matrix of class "dgCMatrix"
                       1            1
(Intercept)  31.74123706  31.86654225
crim         -0.09834634  -0.09869320
zn            0.04144161   0.04158829
indus         .            .         
chas          2.68518774   2.68163334
nox         -16.30664523 -16.35459059

They are slightly different, and if you provide the lambda sequence used in cv.glmnet:

fit = glmnet(x=as.matrix(data[,-14]), y=data[,14], alpha=1, lambda=cvfit$lambda)
coef <- coef(fit, s = cvfit$lambda.min)
head(cbind(coef.cv,coef))

6 x 2 sparse Matrix of class "dgCMatrix"
                       1            1
(Intercept)  31.74123706  31.74123706
crim         -0.09834634  -0.09834634
zn            0.04144161   0.04144161
indus         .            .         
chas          2.68518774   2.68518774
nox         -16.30664523 -16.30664523

They are the same now. And the dev.ratio will match too:

fit$dev.ratio[fit$lambda==cvfit$lambda.min]
[1] 0.7401482
cvfit$glmnet.fit$dev.ratio[which.min(cvfit$cvm)]
[1] 0.7401482

cv.glmnet vs glmnet results; gauging explanatory power

1 Answers