3
votes

I ran a 20-fold cv.glmnet lasso model to obtain the "optimal" value for lambda. However, when I attempt to reproduce the results from glmnet(), the I get an error that reads:

Warning messages:
1: from glmnet Fortran code (error code -1); Convergence for 1th lambda
   value not reached after maxit=100000 iterations; solutions for larger 
   lambdas returned 
2: In getcoef(fit, nvars, nx, vnames) :
   an empty model has been returned; probably a convergence issue

My code reads as such:

set.seed(5)
cv.out <- cv.glmnet(x[train,],y[train],family="binomial",nfolds=20,alpha=1,parallel=TRUE)
coef(cv.out)
bestlam <- cv.out$lambda.min
lasso.mod.best <- glmnet(x[train,],y[train],alpha=1,family="binomial",lambda=bestlam)

Now, the value of bestlam above is 2.976023e-05 so perhaps this is causing the problem? Is it a rounding issue on the value of lambda? Is there a reason why I can't reproduce the results directly from the glmnet() function? If I use a vector of lambda values in the similar range to this value of bestlam, I do not have any issues.

2

2 Answers

4
votes

You're passing a single lambda to your glmnet (lambda=bestlab) which is a big no-no (you're attempting to train a model just using one lambda value).

From the glmnet documentation (?glmnet):

lambda: A user supplied lambda sequence. Typical usage is to have the 
program compute its own lambda sequence based on nlambda and 
lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use 
with care. Do not supply a single value for lambda (for predictions after CV 
use predict() instead). Supply instead a decreasing sequence of lambda 
values. glmnet relies on its warms starts for speed, and its often faster to 
fit a whole path than compute a single fit.
2
votes

glmnet is a little tricky in that respect - you'll want to run your best model with a series of lambdas (e.g., set nlambda=101), and then when you predict set s=bestlam and exact=FALSE.