How to store individual prediction models when looping over a C5.0 decision tree with cross validation in R?

Question

I am new to R and am using a for loop in order to implement 5 fold cross validation with a C5.0 decision tree for an assignment. My dataset looks as follows:

head(data_known)
order_item_id order_date item_id item_size brand_id item_price user_id
1             1    2012-09    1507   UNSIZED      102       24.9   4694
2             2    2012-11    1745        10       64       75.0   6097
3             3    2013-01    2588       XXL       42       79.9   7223
4             4    2012-08     164        40       47       79.9   4124
5             5    2012-09    1640         L       97       69.9    881
6             6    2013-03    2378        38       72      129.9   1576
user_title user_dob             user_state user_reg_date
1        Mrs  1964-11   Rhineland-Palatinate       2011-02
2        Mrs  1973-08            Brandenburg       2011-05
3        Mrs  1949-08               Saarland       2013-01
4        Mrs  1960-12              Thuringia       2012-08
5        Mrs  1971-06     Baden-Wuerttemberg       2012-01
6        Mrs  1965-10 North Rhine-Westphalia       2011-02   
delivery_time_days user_title_NA item_size_NA user_dob_NA    target
1                  2             0            0           0    Return
2                  4             0            0           0 No Return
3                  2             0            0           0    Return
4                  5             0            0           0    Return
5                  3             0            0           0    Return
6                 11             0            0           0    Return

Now, my loop is:

explanatory_variables.dt<-names(data_known)[-16]
form.dt<-as.formula(paste("target ~", paste(explanatory_variables.dt,    
collapse = "+")))  
folds.dt<-split(data_known,cut(sample(1:nrow(data_known)),5))
errs.c50.dt<-rep(NA,length(folds.dt))

for (i in 1:length(folds.dt)) {
test.dt<-ldply(folds.dt[i],data.frame)
train.dt<-ldply(folds.dt[-i],data.frame)
tmp.model.dt<-C5.0(form.dt,train.dt)                      
tmp.predict.dt<-predict(tmp.model.dt, newdata=test.dt)      
conf.mat.dt<-table(test.dt$target,tmp.predict.dt)
errs.c50.dt[i]<-1-sum(diag(conf.mat.dt))/sum(conf.mat.dt)        
  }
print(sprintf("average error using k-fold cross validation and C5.0       
decision tree algorithm: %.3f percent", 100*mean(errs.c50.dt)))

How do I access/safe the whole tree model in the loop in order to predict the outcome of the target variable in another dataset where its true realizations are still unknown? Or do I have to base the predictions on tmp.model.dt alone when using cross validation?

Thank you in advance for your help.

Best,

Nico

The structure you're after is a list. Create one and store the model there. You can save the list using save for later use. — Roman Luštrik
Thank you for the quick reply, Roman. I was able to solve it by now due to comments from you and j. — Nico

jmuhlenkamp jmuhlenkamp · Accepted Answer · 2017-11-26T02:21:51

Here is a simple reproducible answer that expands upon Roman's comment.

list_models <- list()
for (i in 1:2){
   tmp_data <- mtcars[,c(1, i+1)]
   list_models[[i]] <- lm(mpg ~ ., data = tmp_data)
}
head(predict(list_models[[1]], newdata = mtcars))
head(predict(list_models[[2]], newdata = mtcars))

I am using lm here, but this will work just as well with C5.0 as the predict function will work on either model object.

How to store individual prediction models when looping over a C5.0 decision tree with cross validation in R?

1 Answers