4
votes

I have been trying to get the below code to run in caret but get the error. Can anyone tell me how to trouble shoot it.

Error in [.data.frame(data, , lvls[1]) : undefined columns selected

library(tidyverse)
library(caret)

mydf <- iris

mydf <- mydf %>% 
  mutate(tgt = as.factor(ifelse(Species == 'setosa','Y','N'))) %>% 
  select(everything(), -Species)

trainIndex <- createDataPartition(mydf$tgt, p = 0.75, times = 1, list = FALSE)
train <- mydf[trainIndex,]
test <- mydf[-trainIndex,]

fitControl <- trainControl(method = 'repeatedcv',
                       number = 10,
                       repeats = 10,
                       allowParallel = TRUE,
                       summaryFunction = twoClassSummary)

fit_log <- train(tgt~.,
             data = train,
             method = "glm",
             trControl = fitControl,
             family = "binomial")
2
Looks like your issue is that train indicates both your training set data and the caret function. Disambiguate and see if you still have a problem... - CPak
Hi, I changed the train and the test portions of the code to mytrain and mytest but the same error persists - John Smith
summaryFunction in fitControl is causing the error. Not sure what it does, so I can't help you there, but it should get you started. - CPak

2 Answers

8
votes

You need to used classProbs = TRUE in your control function. The ROC curve is based on the class probabilities and the error is the summary function not finding those columns.

0
votes

Use data = data.frame(xxxxx). As in the example below

fit.cart <- train(Condition~., data = data.frame(trainset), method="rpart", metric=metric, trControl=control)