4
votes

I have been trying to get the below code to run in caret but get the error. Can anyone tell me how to trouble shoot it.

Error in [.data.frame(data, , lvls[1]) : undefined columns selected

library(tidyverse)
library(caret)

mydf <- iris

mydf <- mydf %>% 
  mutate(tgt = as.factor(ifelse(Species == 'setosa','Y','N'))) %>% 
  select(everything(), -Species)

trainIndex <- createDataPartition(mydf$tgt, p = 0.75, times = 1, list = FALSE)
train <- mydf[trainIndex,]
test <- mydf[-trainIndex,]

fitControl <- trainControl(method = 'repeatedcv',
                       number = 10,
                       repeats = 10,
                       allowParallel = TRUE,
                       summaryFunction = twoClassSummary)

fit_log <- train(tgt~.,
             data = train,
             method = "glm",
             trControl = fitControl,
             family = "binomial")
2
Looks like your issue is that train indicates both your training set data and the caret function. Disambiguate and see if you still have a problem...CPak
Hi, I changed the train and the test portions of the code to mytrain and mytest but the same error persistsJohn Smith
summaryFunction in fitControl is causing the error. Not sure what it does, so I can't help you there, but it should get you started.CPak

2 Answers

8
votes

You need to used classProbs = TRUE in your control function. The ROC curve is based on the class probabilities and the error is the summary function not finding those columns.

0
votes

Use data = data.frame(xxxxx). As in the example below

fit.cart <- train(Condition~., data = data.frame(trainset), method="rpart", metric=metric, trControl=control)