Unable to run caret xgboost classification

Question

I was trying to use xgboost for classification of the iris data, but face this error.

"Error in frankv(predicted) : x is a list, 'cols' can not be 0-length In addition: Warning message: In train.default(x_train, y_train, trControl = ctrl, tuneGrid = xgbgrid, : cannnot compute class probabilities for regression"

I am using the following code. Any help or explanation will be highly appreciated.

data(iris)
library(caret)
library(dplyr)
library(xgboost)

set.seed(123)
index <- createDataPartition(iris$Species, p=0.8, list = FALSE)
trainData <- iris[index,]
testData <- iris[-index,]


x_train = xgb.DMatrix(as.matrix(trainData %>% select(-Species)))
y_train = as.numeric(trainData$Species)



#### Generic control parametrs
ctrl <- trainControl(method="repeatedcv", 
                    number=10, 
                    repeats=5,
                    savePredictions=TRUE, 
                    classProbs=TRUE,
                    summaryFunction = twoClassSummary)

xgbgrid <- expand.grid(nrounds = 10,
                    max_depth = 5,
                    eta = 0.05,
                    gamma = 0.01,
                    colsample_bytree = 0.75,
                    min_child_weight = 0,
                    subsample = 0.5,
                    objective = "binary:logitraw",
                    eval_metric = "error")


set.seed(123)
xgb_model = train(x_train, 
                y_train,  
                trControl = ctrl,
                tuneGrid = xgbgrid,
                method = "xgbTree")

Take a look at this line: y_train = as.numeric(trainData$Species). Also using the twoClassSummary function will not be appropriate since Species has three levels. Fix these two and you're good to go. Use multiClassSummary instead. Functions in this comment may not be in the correct case(lower/upper). — NelsonGon
Thanks for identifying the error in class summary, however, I have tried to convert the y as factor by y_train <- as.factor(as.numeric(trainData$Species)), but getting this error "Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to X0, X1, X2 . Please use factor levels that can be used as valid R variable names (see ?make.names for help)." — ABS2019
Just use as.factor not as.factor(as.numeric()) although Species is already a factor in the iris data set negating the need for that. I ran it without issues, didn't use your tune grid and also stopped the training as it would take a lot of time but it was going to work anyways. — NelsonGon
Yes, now it ran, but no result came out (tried both using grid and without grid) -----Something is wrong; all the Accuracy metric values are missing: logLoss AUC prAUC Accuracy Kappa Mean_F1 Mean_Sensitivity Mean_Specificity Min. : NA Min. :0.5 Min. : NA Min. : NA Min. : NA Min. : NA Min.... All NA — ABS2019

NelsonGon NelsonGon · Accepted Answer · 2019-07-15T16:42:29

There are a few issues:

The outcome variable should be a factor.
The tune grid has parameters that are not used by caret's tune grid.
Since there are three levels, using a two class summary would be inappropriate. A multiclass summary is used with summaryFunction = multiClassSummary.

A working example:

data(iris)
library(caret)
library(dplyr)
library(xgboost)
    set.seed(123)
index <- createDataPartition(iris$Species, p=0.8, list = FALSE)
trainData <- iris[index,]
testData <- iris[-index,]


x_train = xgb.DMatrix(as.matrix(trainData %>% select(-Species)))
y_train = as.factor(trainData$Species)



#### Generic control parametrs
ctrl <- trainControl(method="repeatedcv", 
                     number=10, 
                     repeats=5,
                     savePredictions=TRUE, 
                     classProbs=TRUE,
                     summaryFunction = multiClassSummary)

xgbgrid <- expand.grid(nrounds = 10,
                       max_depth = 5,
                       eta = 0.05,
                       gamma = 0.01,
                       colsample_bytree = 0.75,
                       min_child_weight = 0,
                       subsample = 0.5)


set.seed(123)
x_train 
xgb_model = train(x_train, 
                  y_train,  
                  trControl = ctrl,
                    method = "xgbTree",
                  tuneGrid = xgbgrid)
xgb_model

Unable to run caret xgboost classification

1 Answers