Query about ridge regression - optimum value of lambda

Question

I have a query about the cv.glmnet() function in R which is supposed to find the "optimum" value of the parameter lambda for ridge regression.

In the example code below, if you experiment a bit with values of lambda that are smaller than the one that cv.glmnet() gives, you will find that the error sum of squares actually is much smaller than what cv.fit$lambda.min gives.

I have noticed this with many datasets. Even the example in the well known book "Introduction to Statistical Learning", (ISLR) by Gareth James et al has this problem. (Section 6.6.1 using the Hitters dataset). The actual value of lambda that minimizes the MSE is smaller than what the ISLR book gives. This is true both on the train data as well as new test data.

What is the reason for this? So, what exactly is cv.fit$lambda.min returning?

Ravi

data(mtcars)
y = mtcars$hp
X = model.matrix(hp~mpg+wt+drat, data=mtcars)[ ,-1]
X

lambdas = 10^seq(3, -2, by=-.1)

fit = glmnet(X, y, alpha=0, lambda=lambdas)
summary(fit)

cv.fit = cv.glmnet(X, y, alpha=0, lambda=lambdas)

# what is the optimum value of lambda?
(opt.lambda = cv.fit$lambda.min)    # 1.995262

y.pred = predict(fit, s=0.01, newx=X, exact=T)  # gives lower SSE

# Sum of Squares Error
(sse = sum((y.pred - y)^2))

Łukasz Deryło Łukasz Deryło · Accepted Answer · 2017-09-13T06:08:32

cv.glmnet searches for lambda minimizing cross-validation score, not MSE.

From ?cv.glmnet:

The function runs glmnet nfolds+1 times; the first to get the lambda sequence, and then the remainder to compute the fit with each of the folds omitted. The error is accumulated, and the average error and standard deviation over the folds is computed.

Query about ridge regression - optimum value of lambda

1 Answers