
Getting an error when using glmnet in Caret

Example below Load Libraries


Load churn data set from library C50


create x and y variables

churn_x <- subset(churnTest, select= -churn)   
churn_y <- churnTest[[20]]

Use createFolds() to create 5 CV folds on churn_y, the target variable

 myFolds <- createFolds(churn_y, k = 5)

Create trainControl object: myControl

myControl <- trainControl(
 summaryFunction = twoClassSummary,
 classProbs = TRUE, # IMPORTANT!
 verboseIter = TRUE,
 savePredictions = TRUE,
 index = myFolds

Fit glmnet model: model_glmnet

model_glmnet <- train(
  x = churn_x, y = churn_y,
  metric = "ROC",
  method = "glmnet",
  trControl = myControl

Im getting the following error

Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : NA/NaN/Inf in foreign function call (arg 5) In addition: Warning message: In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : NAs introduced by coercion

I have checked and there are no missing values in the churn_x variables


Does anyone know the answer?


2 Answers


The problem is in the model specification. If you use the caret train formula interface the training will work:

train <- data.frame(churn_x, churn_y)

model_glmnet <- train(churn_y ~ ., data = train,
  metric = "ROC",
  method = "glmnet",
  trControl = myControl

> model_glmnet$results
  alpha       lambda       ROC      Sens      Spec      ROCSD     SensSD      SpecSD
1  0.10 0.0001754386 0.6958156 0.2845934 0.9123349 0.01855530 0.01616471 0.004002873
2  0.10 0.0017543858 0.7187303 0.2901986 0.9185721 0.01681286 0.01415863 0.005347573
3  0.10 0.0175438576 0.7399174 0.2355121 0.9487161 0.01482812 0.03932741 0.010769455
4  0.55 0.0001754386 0.6988285 0.2901800 0.9121614 0.01907845 0.01312159 0.004200233
5  0.55 0.0017543858 0.7260286 0.2946617 0.9185714 0.01761485 0.02171189 0.006755247
6  0.55 0.0175438576 0.7630039 0.2008939 0.9617103 0.01743847 0.03989938 0.006118592
7  1.00 0.0001754386 0.7009482 0.2924146 0.9119881 0.01958200 0.01233419 0.004157393
8  1.00 0.0017543858 0.7313495 0.2957728 0.9203040 0.01797853 0.02356945 0.008478577
9  1.00 0.0175438576 0.7672690 0.1595779 0.9760892 0.01935176 0.01935583 0.007938801

However when you specify x and y it will not work because glmnet takes the x in the form of a model matrix, When you supply the formula to caret it will take care of model.matrix creation but if you just specify the x and y then it will assume x is a model.matrix and will pass it to glmnet. For instance this works:

x <- model.matrix(churn_y ~ ., data = train)

model_glmnet2 <- train(x = x, y = churn_y,
                      metric = "ROC",
                      method = "glmnet",
                      trControl = myControl
> model_glmnet2$results
  alpha       lambda       ROC      Sens      Spec      ROCSD     SensSD      SpecSD
1  0.10 0.0001754386 0.6958156 0.2845934 0.9123349 0.01855530 0.01616471 0.004002873
2  0.10 0.0017543858 0.7187303 0.2901986 0.9185721 0.01681286 0.01415863 0.005347573
3  0.10 0.0175438576 0.7399174 0.2355121 0.9487161 0.01482812 0.03932741 0.010769455
4  0.55 0.0001754386 0.6988285 0.2901800 0.9121614 0.01907845 0.01312159 0.004200233
5  0.55 0.0017543858 0.7260286 0.2946617 0.9185714 0.01761485 0.02171189 0.006755247
6  0.55 0.0175438576 0.7630039 0.2008939 0.9617103 0.01743847 0.03989938 0.006118592
7  1.00 0.0001754386 0.7009482 0.2924146 0.9119881 0.01958200 0.01233419 0.004157393
8  1.00 0.0017543858 0.7313495 0.2957728 0.9203040 0.01797853 0.02356945 0.008478577
9  1.00 0.0175438576 0.7672690 0.1595779 0.9760892 0.01935176 0.01935583 0.007938801

model.matrix is needed only when there are factor features


If you want to use glmnet and get the same error do this!

Short answer: using data.matrix() fixed my issue!

Initially, I was doing:

# Given X and Y are datframes
cv.glmnet(x = as.matrix(X), y = as.matrix(Y), alpha = 1, family = "binomial")

This was fixed by:

cv.glmnet(x = data.matrix(X), y = as.matrix(Y), alpha = 1, family = "binomial")

Longer answer(not long at all):

I had the same problem, I was passing my X matrix using as.matrix() which turns all elements of a data frame into a coercible type for all columns, if you happen to have factors in your data frame, as.matrix() turns everything into a character. Usingdata.matrix() fixed it for me. data.matrix() can handle factors and ordered factor where as.matrix is more basic.