2
votes

I obtained the following code from this Stack Overflow question. caret train() predicts very different then predict.glm()

The following code is producing an error. I am using caret 6.0-52.

library(car); library(caret); library(e1071)

#data import and preparation
data(Chile)
chile        <- na.omit(Chile)  #remove "na's"
chile        <- chile[chile$vote == "Y" | chile$vote == "N" , ] #only "Y" and "N" required
chile$vote   <- factor(chile$vote)      #required to remove unwanted levels
chile$income <- factor(chile$income)  # treat income as a factor

tc <- trainControl("cv", 2, savePredictions=T, classProbs=TRUE,
               summaryFunction=twoClassSummary)  #"cv" = cross-validation, 10-fold
fit <- train(chile$vote ~ chile$sex            +
           chile$education      +
           chile$statusquo      ,
         data      = chile    ,
         method    = "glm"    ,
         family    = binomial ,
         metric = "ROC",
         trControl = tc)    

Running this code produces the following error.

Something is wrong; all the ROC metric values are missing:
  ROC           Sens             Spec       
 Min.   : NA   Min.   :0.9354   Min.   :0.9187  
 1st Qu.: NA   1st Qu.:0.9354   1st Qu.:0.9187  
 Median : NA   Median :0.9354   Median :0.9187  
 Mean   :NaN   Mean   :0.9354   Mean   :0.9187  
 3rd Qu.: NA   3rd Qu.:0.9354   3rd Qu.:0.9187  
 Max.   : NA   Max.   :0.9354   Max.   :0.9187  
 NA's   :1                                      
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.

Would anyone know what the issue is or can reproduce/ not reproduce this error. I've seen other answers to this error message that says this has to do with not having representation of classes in each cross validation fold but this isn't the issue as the number of folds is set to 2.

4
Did you manage to solve this? I'm facing a similar issue, but none of the answers below helped me.ciri

4 Answers

1
votes

Looks like I needed to install and load the pROC package.

install.packages("pROC") library(pROC)

0
votes

You should install using

install.packages("caret", dependencies = c("Imports", "Depends", "Suggests"))

That gets most of the default packages. If there are specific modeling packages that are missing, the code usually prompts you to install them.

0
votes

I know I'm late to the party, but I think you need to set classProbs = TRUE in train control.

0
votes

You are using logistic regression when using the parameters method = "glm", family = binomial. In this case, you must make sure that the target variable (chile$vote) has only 2 factor levels, because logistic regression only performs binary classification.

If the target has more than two labels, then you must set family = "multinomial"