Context and error message
I try to fit a two-class prediction model using glmnet within caret. I incur an error when using the caret default tune grids. I don't think it is due to wrongly formatted data because, when specifying my own tuning grid, there is no problem. The error message is:
Error in loop$lambda[loop$alpha == alph[i]] <- np[which.max(np)] :
replacement has length zero
When checking the line at which the error occurs, one sees that R tries to find a maximum which.na()
over a vector np
of NA (the lambda values chosen by caret/glmnet?). I failed to debug this properly because I cannot find a way to step through each line of code after calling train()
. I hope somebody with more experience can help me out.
Minimal working example
I created a minimal working example by making my dataset as small as possible (it started with ~200 rows and ~40 columns) while preserving the error. Note that manualModelFit
works fine but modelFit
cannot be computed:
library(caret)
library(glmnet)
# create data frame of features
var1 <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1)
var2 <- c(1,1,1,1,1,0,1,1,1,1,1,0,1,1,0,1,1)
trainData <- data.frame(v1 = var1, v2 = var2)
# create fature vector of outcomes
trainClass <- as.factor(c('event','event','event','event','event','event','event','event','event','event','nonEvent','event','event','event','event','event','nonEvent'))
# set k for k-fold CV
kInner = 5
# set randomization seed
mySeed = 1622017
# set options for caret in fitControl
fitControl <- trainControl( method = 'cv', number = kInner, classProbs = TRUE, allowParallel = FALSE, summaryFunction = twoClassSummary, verboseIter = FALSE)
# run parameter tuning with a user-specified tuning grid
set.seed(mySeed)
myTuneGrid <- expand.grid(alpha = c(0,0.5,1), lambda = c(0,0.5,1))
manualModelFit <- train(x = trainData, y = trainClass, method = 'glmnet' , trControl = fitControl, metric = 'ROC', tuneGrid = myTuneGrid)
# run default parameter tuning
set.seed(mySeed)
modelFit <- train(x = trainData, y = trainClass, method = 'glmnet' , trControl = fitControl, metric = 'ROC')
The questions
What causes the failure? Is this a bug within caret/glmnet or is this due to a property of the dataset that I overlooked? This error occurs in multiple datasets that I analyze.