When estimating a lasso model via the glmnet package, I am wondering whether it is better to: (a) pull coefficients / predictions / deviance straight from the cv.fit object procured from cv.glmnet, or (b) use the minimum lambda from cv.glmnet to re-run glmnet and pull these objects from the glmnet process. (Please be patient -- I have a feeling that this is documented, but I'm seeing examples/tutorials of both online, and no solid logic for going one way or the other.)
That is, for coefficients, I can run (a):
cvfit = cv.glmnet(x=xtrain, y=ytrain, alpha=1, type.measure = "mse", nfolds = 20)
coef.cv <- coef(cvfit, s = "lambda.min")
Or I can afterwards run (b):
fit = glmnet(x=xtrain, y=ytrain, alpha=1, lambda=cvfit$lambda.min)
coef <- coef(fit, s = "lambda.min")
While these two processes select the same model variables, they do not produce identical coefficients. Similarly, I could predict via either of the following two processes:
prdct <- predict(fit,newx=xtest)
prdct.cv <- predict(cvfit, newx=xtest, s = "lambda.min")
And they predict similar but NOT identical vectors.
Last, I would have THOUGHT I could pull % deviance explained via either of the two methods:
percdev <- fit$dev.ratio
percdev.cv <- cvfit$glmnet.fit$dev.ratio[cvfit$cvm==mse.min.cereal]
But in fact, it is not possible to pull percdev.cv in this way, because if the lambda sequence used by cv.glmnet has less than 100 elements, the lengths of cvfit$glmnet.fit$dev.ratio and cvfit$cvm==mse.min.cereal don't match. So I'm not quite sure how to pull the minimum-lambda dev.ratio from cvfit$glmnet.fit.
So I guess I'm wondering which process is best, why, and how people normally pull the appropriate dev.ratio statistic. Thanks!
?glmnet, doc forlambdasays: WARNING: use with care. Avoid supplying a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.. So I guess this partly answers your question about which to use. - acylamglmnetshould be used with a (default or supplied) lambda sequence, but once such a sequence has been supplied tocv.glmnet, and an "optimal" lambda (lambda.1seorlambda.min) obtained, one would think that using that lambda would result in identicalglmnetresults as it did undercv.glmnet. Even if slower to calculate. Additionally, I do have a hunch that it's probably better to obtain coefficients and predictions fromcv.glmnet, but I am not sure how to obtaindev.ratiofromcv.glmnet. - Leah Bevis