I'm completely new to machine learning. But I'm working on data set and want to perform a three class classification problem and want to compare a few models using caret. When trying to use glmnet I encounter a problem and receive the following error messages:
returning Infmodel fit failed for Fold6.Rep10: alpha=0.4198, lambda=0.523974
Error in T[i, ] : subscript out of bounds
There were missing values in resampled performance measures.
Something is wrong; all the Mean_Balanced_Accuracy metric values are missing:
logLoss AUC prAUC Accuracy Kappa Mean_F1 Mean_Sensitivity Mean_Specificity Mean_Pos_Pred_Value Mean_Neg_Pred_Value Mean_Precision
Min. : NA Min. :0.5 Min. : NA Min. : NA Min. : NA Min. : NA Min. : NA Min. : NA Min. : NA Min. : NA Min. : NA
1st Qu.: NA 1st Qu.:0.5 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
Median : NA Median :0.5 Median : NA Median : NA Median : NA Median : NA Median : NA Median : NA Median : NA Median : NA Median : NA
Mean :NaN Mean :0.5 Mean :NaN Mean :NaN Mean :NaN Mean :NaN Mean :NaN Mean :NaN Mean :NaN Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.:0.5 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. :0.5 Max. : NA Max. : NA Max. : NA Max. : NA Max. : NA Max. : NA Max. : NA Max. : NA Max. : NA
NA's :5 NA's :5 NA's :5 NA's :5 NA's :5 NA's :5 NA's :5 NA's :5 NA's :5 NA's :5
Mean_Recall Mean_Detection_Rate Mean_Balanced_Accuracy
Min. : NA Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA Median : NA
Mean :NaN Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA Max. : NA
NA's :5 NA's :5 NA's :5
Error: Stopping
Error traceback:
5.
stop("Stopping", call. = FALSE)
4.
train.default(x, y, weights = w, ...)
3.
train(x, y, weights = w, ...)
2.
train.formula(Species ~ ., data = dfiT, method = "glmnet", trControl = trCtr, metric = "Mean_Balanced_Accuracy", tuneLength = 5, family = "multinomial", type.multinomial = "grouped", standardize.response = F, maximize = T)
1.
train(Species ~ ., data = dfiT, method = "glmnet", trControl = trCtr, metric = "Mean_Balanced_Accuracy", tuneLength = 5, family = "multinomial", type.multinomial = "grouped", standardize.response = F, maximize = T)
When fitting the model using cv.glmnet the model runs without any issues and I get the expected output. However I seem to make a mistake, when using caret and I can't figure out what I'm doing wrong.
I'm not working with the iris data frame, but I could replicate the same error I get with my data frame by using the iris data. I also added a binary column, since my data also contains one. The number of observations in my classes are not equal, but that doesn't seem to be the problem here.
I think this is probably a beginners error but I can't seem to find a solution (either online, or in the manuals).
Does someone have a suggestion for a possible solution?
This is the code I'm using:
library(caret)
data("iris")
head(iris)
rm(iris)
dfi= iris
i = createDataPartition(dfi$Species,times = 1,p=.8,list=F)
dfiT = dfi[i,]
dfiTest = dfi[-i,]
pp <- preProcess(dfiT,method = c("nzv","YeoJohnson","center","scale"))
dfiT <- predict(pp,dfiT)
dfiTest <- predict(pp,dfiTest)
dfiT$bin = runif(length(dfiT))
dfiT$bin = ifelse(dfiT$bin>.5, 1,0)
dfiTest$bin = runif(length(dfiTest))
dfiTest$bin = ifelse(dfiTest$bin>.5, 1,0)
indFold = createMultiFolds(dfiT,
k=12,
times=10)
trCtr =trainControl(method = "repeatedcv",
savePredictions = "final",
returnResamp = "final",
classProbs = T,
summaryFunction = multiClassSummary,
selectionFunction = best,
search = "random",
sampling = "smote",
index = indFold
)
net.fit = train(Species~.,data=dfiT,
method="glmnet",
trControl=trCtr,
metric = "Mean_Balanced_Accuracy",
tuneLength = 5,
family="multinomial",type.multinomial="grouped",standardize.response=F,
maximize=T)