3
votes

Even though I am using the same lambda, it seems the coefficients generated by cv.glmnet() are different from those produced by glmnet(). Why is this? Shouldn't they be the same?

library(glmnet)

# Data dimensions
num.samples <- 30
num.genes <- 17000

# Data objects - note that both X and Y are scaled
set.seed(123)
Y <- matrix(rnorm(num.samples), ncol=1)
set.seed(1234)
X <- matrix(rnorm(num.samples*num.genes), ncol=num.genes)

# Run cv.glmnet: get lambda.min and coef
fit.cv <- cv.glmnet(X, Y, nfolds=num.samples, intercept=FALSE)
fit.cv.lambda <- fit.cv$lambda.min
fit.cv.coef <- coef(fit.cv, s = fit.cv.lambda)[,1][2:(num.genes+1)]

# Run glmnet with lambda.min from cv.glmnet: get coef
second.lambda=fit.cv.lambda-0.0001 ## second.lambda included because glmnet manual recommends using >1 lambda for glmnet()
fit <- glmnet(X, Y, lambda=c(fit.cv.lambda,second.lambda), intercept=FALSE) 
fit.lambda <- fit$lambda[1]
fit.coef <- coef(fit, s = fit.cv.lambda)[,1][2:(num.genes+1)]

# Lambda is the same, but coefficients are not
fit.cv.lambda==fit.lambda ## TRUE
not.equal = which(fit.cv.coef != fit.coef)
length(not.equal) ## 18
mean(abs(fit.cv.coef[not.equal] - fit.coef[not.equal])) ## 0.0004038209

(I've also noticed that coefficients from glmnet() and cv.glmnet() are not different at a certain values of alpha, but there doesn't seem to be an obvious pattern to this)

1
Wow! Err... that's a good one. At another pair of random seeds I tried the parameters are the same. From the R code it looks like the cv.glmnet 'glmnet' object and the glmnet 'glmnet' object look like they are being called in the same way... - CSJCampbell
This is likely relevant: stackoverflow.com/questions/18467173/…. Specifically, the model is not deterministic. - jmuhlenkamp

1 Answers

1
votes

Short Answer: This is a numerical accuracy issue. The discrepancies you are encountering are not due to differences between cv.glmnet and glmnet. Rather they are due to a combination of the following:

  • The penalty paths, lambda, are different between the two objects, by this I mean the entire penalty path, not just whether or not the penalty of interest is in both
  • The default convergence threshold, thresh =

If you want the estimates obtained from two glmnet or cv.glmnet objects that have different penalty paths to be equal (or at least very close), use the thresh = option in both functions to decrease the convergence threshold. In addition, I recommend setting exact = TRUE in coef().

Extended Answer: Below we show this in a few examples. Before doing so, it is also important to know the logic of the coef() function (which calls the predict.glmnet() function with type = "coefficients").

  • If you request coefficient estimates for a penalty that was already computed in the original penalty path of the object, coef() will simply return the estimates from the original object.

  • If you request coefficient estimates for a penalty that is not in the original path and exact = FALSE (this is the default), the coefficients are estimated by interpolation based on estimates from the nearest penalties in the original path.

  • If you request coefficient estimates for a penalty that was not already in the original path and exact = TRUE, the new penalty is added to the original path and the entire model is refit to obtain the estimates

Example 1: Same penalty paths, Same Estimates

If default arguments are used, glmnet() and cv.glmnet() will compute the same penalty path for a given set of data (the number of penalties computed in the path may differ between them due to stopping criterion though). We show this below:

library(glmnet)

# Data dimensions
num.samples <- 30
num.genes <- 17000

# Data objects - note that both X and Y are scaled
set.seed(123)
Y <- matrix(rnorm(num.samples), ncol=1)
X <- matrix(rnorm(num.samples*num.genes), ncol=num.genes)

# Run cv.glmnet and glmnet, obtain same penalty path up to min(num penalty)
fit <- glmnet(X, Y, intercept=FALSE)
cvfit <- cv.glmnet(X, Y, intercept= FALSE)
min_num_lambdas = min(length(fit$lambda), length(cvfit$lambda))
all.equal(fit$lambda[1:min_num_lambdas], cvfit$lambda[1:min_num_lambdas]) #TRUE

We can then use coef() to get the same estimates from both objects regardless of whether or not the new penalty (s) is in the original path.

# Requested penalty in original path
coef1 <- coef(fit, s = fit$lambda[10])
coef2 <- coef(cvfit, s = fit$lambda[10])
all.equal(coef1, coef2) #TRUE

# Requested penalty not in original path -- uses interpolation
coef1 <- coef(fit, s = 0.40)
coef2 <- coef(cvfit, s = 0.40)
all.equal(coef1, coef2) #TRUE

# Force glmnet to refit the model with s added to the penalty path
coef1 <- coef(fit, s = 0.40, exact = TRUE, x = X, y = Y)
coef2 <- coef(cvfit, s = 0.40, exact = TRUE, x = X, y = Y)
all.equal(coef1, coef2) #TRUE

Example 2: Different penalty paths, Different Estimates

We make a small change to the glmnet() fit by requesting 99 penalties in the path instead of 100. This makes the penalty paths differ between fit and cvfit. Now the estimates are different even if we use the exact = TRUE option.

fit <- glmnet(X, Y, intercept=FALSE, nlambda = 99)
coef1 <- coef(fit, s = fit$lambda[10], exact = TRUE, x = X, y = Y)
coef2 <- coef(cvfit, s = fit$lambda[10], exact = TRUE, x = X, y = Y)
all.equal(coef1, coef2) # Mean relative difference: 0.002006215

The Fix

To ensure the estimates match, we can refit both objects with a decreased threshold.

fit <- glmnet(X, Y, intercept=FALSE, nlambda = 99, thresh = 1e-20)
cvfit <- cv.glmnet(X, Y, intercept= FALSE, thresh = 1e-20)
coef1 <- coef(fit, s = fit$lambda[10], exact = TRUE, x = X, y = Y)
coef2 <- coef(cvfit, s = fit$lambda[10], exact = TRUE, x = X, y = Y)
all.equal(coef1, coef2) #TRUE

Note that you could also use the thresh = option in coef(), but this will only work if the new penalty is not already in the original path of either object.

fit <- glmnet(X, Y, intercept=FALSE, nlambda = 99)
cvfit <- cv.glmnet(X, Y, intercept= FALSE)
coef1 <- coef(fit, s = 0.40, exact = TRUE, x = X, y = Y, thresh = 1e-20)
coef2 <- coef(cvfit, s = 0.40, exact = TRUE, x = X, y = Y, thresh = 1e-20)
all.equal(coef1, coef2) #TRUE

The value that thresh = needs to be decreased to will be entirely data-dependent.