I have a query about the cv.glmnet() function in R which is supposed to find the "optimum" value of the parameter lambda for ridge regression.
In the example code below, if you experiment a bit with values of lambda that are smaller than the one that cv.glmnet() gives, you will find that the error sum of squares actually is much smaller than what cv.fit$lambda.min gives.
I have noticed this with many datasets. Even the example in the well known book "Introduction to Statistical Learning", (ISLR) by Gareth James et al has this problem. (Section 6.6.1 using the Hitters dataset). The actual value of lambda that minimizes the MSE is smaller than what the ISLR book gives. This is true both on the train data as well as new test data.
What is the reason for this? So, what exactly is cv.fit$lambda.min returning?
Ravi
data(mtcars)
y = mtcars$hp
X = model.matrix(hp~mpg+wt+drat, data=mtcars)[ ,-1]
X
lambdas = 10^seq(3, -2, by=-.1)
fit = glmnet(X, y, alpha=0, lambda=lambdas)
summary(fit)
cv.fit = cv.glmnet(X, y, alpha=0, lambda=lambdas)
# what is the optimum value of lambda?
(opt.lambda = cv.fit$lambda.min) # 1.995262
y.pred = predict(fit, s=0.01, newx=X, exact=T) # gives lower SSE
# Sum of Squares Error
(sse = sum((y.pred - y)^2))