0
votes

I'm trying to implement some functions to compare five different machine learning models to predict some values in a regression problem.

My intention is working on a suit of functions that could train the different codes and organize them in a suit of results. The models I select by instance are: Lasso, Random Forest, SVM, Linear Model and Neural Network. To tune some models I intend to use the references of Max Kuhn: https://topepo.github.io/caret/available-models.html. However, since each model requires different tuning parameters, I'm in doubt how to set them:

First I set up the grid to 'nnet' model tunning. Here I selected different number of nodes in hidden layer and the decay coefficient:

my.grid <- expand.grid(size=seq(from = 1, to = 10, by = 1), decay = seq(from = 0.1, to = 0.5, by = 0.1))

Then I construct the functions that will run the five models 5 times in a 6-fold configuration:

 my_list_model <- function(model) {
  set.seed(1)
  train.control <- trainControl(method = "repeatedcv", 
         number = 6,
         repeats =  5,
         returnResamp = "all",
         savePredictions = "all")

# The tunning configurations of machine learning models:
  set.seed(1)
  fit_m <- train(ST1 ~., 
         data = train, # my original dataframe, not showed in this code
         method = model, 
         metric = "RMSE", 
         preProcess = "scale", 
         trControl = train.control
         linout = 1        #  linear activation function output
         trace = FALSE
         maxit = 1000
         tuneGrid = my.grid) # Here is how I call the tune of 'nnet' parameters

 return(fit_m)
 } 

Lastly, I execute the five models:

lapply(list(
Lass = "lasso", 
RF = "rf", 
SVM = "svmLinear",
OLS = "lm", 
NN = "nnet"), 
my_list_model) -> model_list

However, when I run this, it shows:

Error: The tuning parameter grid should not have columns fraction

By what I understood, I didn't know how to specify very well the tune parameters. If I try to throw away the 'nnet' model and change it, for example, to a XGBoost model, in the penultimate line, it seems it works well and results would be calculated. That is, it seems the problem is with the 'nnet' tuning parameters.

Then, I think my real question is: how to configure these different parameters of models, in special the 'nnet' model. In addition, since I didn't need to set up the parameters of lasso, random forest, svmLinear and linear model, how were they tuned by the caret package?

1

1 Answers

1
votes
my_list_model <- function(model,grd=NULL){
  train.control <- trainControl(method = "repeatedcv", 
                            number = 6,
                            returnResamp = "all",
                            savePredictions = "all")

 # The tuning configurations of machine learning models:
 set.seed(1)
 fit_m <- train(Y ~., 
             data = df, # my original dataframe, not showed in this code
             method = model, 
             metric = "RMSE", 
             preProcess = "scale", 
             trControl = train.control,
             linout = 1,        #  linear activation function output
             trace = FALSE,
             maxit = 1000,
             tuneGrid = grd) # Here is how I call the tune of 'nnet' parameters
 return(fit_m)
 }

first run below code and see all the related parameters

modelLookup('rf')

now make grid of all models based on above lookup code

svmGrid <-  expand.grid(C=c(3,2,1))
rfGrid <-  expand.grid(mtry=c(5,10,15))

create a list of all model's grid and make sure the name of model is same as name in the list

grd_all<-list(svmLinear=svmGrid
          ,rf=rfGrid)
model_list<-lapply(c("rf","svmLinear")
               ,function(x){my_list_model(x,grd_all[[x]])})
model_list

[[1]]
Random Forest 

17 samples
3 predictor

Pre-processing: scaled (3) 
Resampling: Cross-Validated (6 fold, repeated 1 times) 
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ... 
Resampling results across tuning parameters:

mtry  RMSE      Rsquared   MAE     
 5    63.54864  0.5247415  55.72074
10    63.70247  0.5255311  55.35263
15    62.13805  0.5765130  54.53411

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was mtry = 15.

[[2]]
Support Vector Machines with Linear Kernel 

17 samples
3 predictor

Pre-processing: scaled (3) 
Resampling: Cross-Validated (6 fold, repeated 1 times) 
Summary of sample sizes: 14, 14, 15, 14, 14, 14, ... 
Resampling results across tuning parameters:

C  RMSE      Rsquared   MAE     
1  59.83309  0.5879396  52.26890
2  66.45247  0.5621379  58.74603
3  67.28742  0.5576000  59.55334

RMSE was used to select the optimal model using the smallest value.
The final value used for the model was C = 1.