2
votes

I'm trying to apply stacking on my dataset but here I am.

# Load library
library(DJL)
library(caret)
library(caretEnsemble)

# Load data and format the target attribute to avoid clutters
df <- dataset.engine.2015[, -c(1, 2)]
levels(df$Type) <- list(NA.D = "NA-D", NA.P = "NA-P", SC.P = "SC-P", TC.D = "TC-D", TC.P = "TC-P")

# Run
st.methods <- c("lda", "rpart", "glm", "knn", "svmRadial")
st.control <- trainControl(method = "repeatedcv", number = 5, repeats = 3, 
                           savePredictions = T, classProbs = T)
st.models  <- caretList(Type ~., data = df, trControl = st.control, methodList = st.methods)

Then I get this:

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error: Stopping
In addition: There were 18 warnings (use warnings() to see them)

Can anyone help me to fix this error?

1
Could you try to make your example reproducible, else any advice would be mere speculation.Silence Dogood
Sorry, I forgot to load the package "caretEnsemble". Now, you can reproduce my error. Thanks for pointing that out!user4143385
Without a few observations from your input dataset dataset.engine.2015, your example is not reproducible. If your dataset is proprietary you can use options mentioned in first comment to anonymize it. Essentially, create a small test dataset on which you receive the error and post the output of dput(DF) where DF is the test dataset.Silence Dogood
I don't know what you mean by "not reproducible". The dataset (dataset.engine.2015) is accompanied with the package "DJL" so I believe you can simply implement my code to load it and reproduce my issue. Please advise if this is not what you asked.user4143385
I am sorry,please excuse my unfamiliarity with DJL package and my assumption that dataset.engine.2015 is a user-defined dataset.Silence Dogood

1 Answers

2
votes

The glm model cannot be used for predicting categorical dependent variables with more than two categories. Try to delete glm from st.methods or substitute glm with, for example, multinom, gbm, randomForest.

Here are two useful experiment. In the first we consider only glm:

rm(list=ls())
library(DJL)
library(caret)
library(caretEnsemble)  
df <- dataset.engine.2015[, -c(1, 2)]
levels(df$Type) <- list(NA.D = "NA-D", NA.P = "NA-P", SC.P = "SC-P", TC.D = "TC-D", TC.P = "TC-P")

st.control <- trainControl(method = "repeatedcv", number = 5, repeats = 3, 
                           savePredictions = T, classProbs = T)

st.methods <- c("glm")
st.models  <- caretList(Type ~., data = df, trControl = st.control, methodList = st.methods)

Here is the error message:

Something is wrong; all the Accuracy metric values are missing:
    Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error in train.default(x, y, weights = w, ...) : Stopping
Inoltre: There were 18 warnings (use warnings() to see them)

Now we substitute glm with multinom:

st.methods <- c("multinom")
st.models  <- caretList(Type ~., data = df, trControl = st.control, methodList = st.methods)
print(st.models)

The output is:

$multinom
Penalized Multinomial Regression 

1206 samples
   5 predictor
   5 classes: 'NA.D', 'NA.P', 'SC.P', 'TC.D', 'TC.P' 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 3 times) 
Summary of sample sizes: 964, 965, 965, 965, 965, 964, ... 
Resampling results across tuning parameters:

  decay  Accuracy   Kappa    
  0e+00  0.9306411  0.8518294
  1e-04  0.9300901  0.8506964
  1e-01  0.9328507  0.8564466

Accuracy was used to select the optimal model using  the largest value.
The final value used for the model was decay = 0.1.