0
votes

I'm trying to perform a repeated 4-fold cross validated regression on a dataset with 28 samples. I get the following error:

> data1
     X1  X2   X3  outcome
1     7   0  180      108
2   130   0   35      104
3     0   0    3       97
4    23   0    0       11
5   122   0  383       16
6   103   0  272       74
7   403   0    0       58
8   127   0    0       16
9    35   0  268       52
10  353  10  420       49
11  211   0  220       47
12   28   0   18       50
13  210   0  603       39
14  260   1  313       37
15    5   0  468       29
16   40   0    9       10
17  255   0  229       33
18  254   6  205       29
19    4  28  165       44
20  225   0  147       14
21  339   0    0       23
22  347   2  324       20
23  214   3  313       16
24   73   4  386       13
25  297   0  369      118
26  248   0  492       92
27   89   0    0       87
28    5   0    9       80

> set.seed(123)
> train.control <- trainControl(method = "repeatedcv", number = 4, repeats = 3)
> model <- train(data1$outcome ~., data = data1, method = "lm",trControl = train.control)
Error in `[.data.frame`(data, , all.vars(Terms), drop = FALSE) : 
  undefined columns selected

I also tried removing the outcome (data=data1[,-4]) but I still get the same error. Can you help me with this?

1

1 Answers

1
votes

Use a formula syntax in train function.

library(caret)
set.seed(123)
train.control <- trainControl(method = "repeatedcv", number = 4, repeats = 3)
model <- train(outcome ~., data = data1, method = "lm",trControl = train.control)
model
#Linear Regression 

#28 samples
# 3 predictor

#No pre-processing
#Resampling: Cross-Validated (4 fold, repeated 3 times) 
#Summary of sample sizes: 20, 21, 22, 21, 22, 20, ... 
#Resampling results:

#  RMSE      Rsquared    MAE     
#  38.78937  0.08910678  33.24453

#Tuning parameter 'intercept' was held constant at a value of TRUE