4
votes

I am trying to use various prediction algorithms from the Caret package in R for regression problem that is my target variable is continuous. Caret thinks classification is the appropriate class of the problem and when I pass any of the regression models, I get an error message that says "wrong model type for classification". For reproducibility, let's see with the Combined Cycle Power Plant Data Set. The data is in CCPP.zip. Let's predict power as a function of the other variables. Power is a continuous variable.

  library(readxl)
  library(caret)
  power_plant = read_excel("Folds5x2_pp.xlsx")
  apply(power_plant,2, class)   # shows all columns are numeric

  control <- trainControl(method="repeatedcv", number=10, repeats=5)

  my_glm <- train(power_plant[,1:4], power_plant[,5],
           method = "lm",
           preProc = c("center", "scale"),
            trControl = control)

The image below is my screenshot:

enter image description here

2

2 Answers

2
votes

For some reason caret gets confused by tibbles, which is the tidyverse variant of a data frame that read_excel returns. By converting it to a simple data frame before giving it to caret, everything works:

library(readxl)
library(caret)
power_plant = read_excel("Folds5x2_pp.xlsx")
apply(power_plant,2, class)   # shows all columns are numeric

power_plant <- data.frame(power_plant)
control <- trainControl(method="repeatedcv", number=10, repeats=5)

my_glm <- train(power_plant[,1:4], power_plant[,5],
                method = "lm",
                preProc = c("center", "scale"),
                trControl = control)

my_glm

yielding:

Linear Regression 

9568 samples
   4 predictor

Pre-processing: centered (4), scaled (4) 
Resampling: Cross-Validated (10 fold, repeated 5 times) 
Summary of sample sizes: 8612, 8612, 8611, 8612, 8612, 8610, ... 
Resampling results:

  RMSE      Rsquared 
  4.556703  0.9287933

Tuning parameter 'intercept' was held constant at a value of TRUE
0
votes

I get a similar error when I try to use formula = y ~ x, works great just omitting the named variable and using y ~ x.