I obtain random crashes in package glmnet (versions 2.0.10 and 2.0.13, at least), trying to run cv.glmnet with a ridge logistic regression. A reproducible example is provided below. As you will see, the behaviour depends on the chosen random seed.
The error occurs in cv.lognet() because sometimes nlami==0. This is due to the fact that the range of the global (not cross-validated) lambda sequence (i.e. [14.3;20.7] in the example below) is entirely smaller than the range of lambda on one of the folds (i.e. fold 4, [32.5; 22.4])
A possible fix would be to force nlami>=1 by changing the definition of which_lam as follows:
which_lam = lambda >= min(mlami, max(lambda))
This would avoid the crash, but not sure whether correctness of the results is ensured. Can anybody confirm or propose another fix?
NB: seems related to unresolved question cv.glmnet fails for ridge, not lasso, for simulated data with coder error
Reproducible example
library(glmnet)
x=structure(c(0.294819653005975, -0.755878041644385, -0.460947383309942,
-1.25359210780316, -0.643969512320233, -0.146301489038128, -0.190235360501265,
-0.778418128295596, -0.659228201713315, -0.589987067456389, 1.33064976036166,
-0.232480434360983, -0.374383490492533, -0.504817187501063, -0.558531620483801,
2.16732105550181, 0.238948891919474, -0.857229316573454, -0.673919980092841,
1.17924306872964, 0.831719897152008, -1.15770770325374, 2.54984789196214,
-0.970167597835476, -0.557900637238063, -0.432268012373971, 1.15479761345536,
1.72197312745038, -0.460658453148444, -1.17746101934592, 0.411060691690596,
0.172735774511478, 0.328416881299735, 2.13514661730084, -0.498720272451663,
0.290967756655844, -0.87284566376257, -0.652533179632676, -0.89323787137697,
-0.566883371886824, -1.1794485033936, 0.821276174960557, -0.396480750015741,
-0.121609740429242, -0.464060359619162, 0.0396628676584573, -0.942871230138644,
0.160331360905244, -0.369955203694528, -0.192318421900764, -1.39309898491775,
-0.264395753844046, 2.25142560078458, -0.897873918532094, -0.159680604037913,
-0.918027468751383, 0.43181753901048, 1.56060286954228, -0.617456504201816,
1.73106033616784, -0.97099289786049, -1.09325650121771, -0.0407358272757967,
0.553103582991963, 1.15479545417553, 0.36144086171342, -1.35507249278068,
1.37684903500442, 0.755599287825675, 0.820363089698391, 1.65541232241803,
-0.692008406375665, 1.65484854848556, -1.14659093945895), .Dim = c(37L, 2L))
# NB: x is already standardized
print(apply(x,2,mean))
print(apply(x,2,sd))
y=c(TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE,
FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE,
TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE)
# NB: y is moderately unbalanced
print(table(y))
# This works OK (with a warning):
set.seed(3)
m = cv.glmnet(x, y, family = "binomial", alpha = 0, standardize = FALSE, type.measure = "class", nfolds = 5)
# This crashes:
set.seed(1)
m = cv.glmnet(x, y, family = "binomial", alpha = 0, standardize = FALSE, type.measure = "class", nfolds = 5)
# Error in predmat[which, seq(nlami)] <- preds :
# replacement has length zero
EDIT: visualization of data shows no specific pattern. Expect a low performance for a linear separator:


cv.glmnetand it appears that all train sets have at least 8 instances of each class. In the test sets, the most unbalanced is 6 FALSE vs 1 TRUE. On the other hand, the small number of observations makes cross-validation even more necessary - Pierre Gramme