Goal
I'd like to implement a LASSO model and check its viability on a training set according to the schematic shown here. (Schematic destription: all data is split into testing and training sets. The training set is split via 5-fold cross-validation (CV) into resamples. where 10-fold CV is performed on each resample to find optimal lambdas.) The testing set is not available yet.
I'd like to LASSO model and check its performance using nested CV with inner CV to obtain optimal lambda (analysis and assessment) via a grid searches and outer CV to compare resamples 1, 2, 3 etc.
Caret with 'repeatedcv'
The train-control with 'repeatedcv' from Caret allows to specify number and repeats.
lambdas = 10^seq(-3, -1, length = 20)
trControl = trainControl(
'repeatedcv',
number = 10,
repeats = 5,
search = 'grid'
)
tuneGrid = expand.grid(alpha = 1, lambda = lambdas)
lasso = train(
PD ~ ., data = selection,
method = 'glmnet',
trControl = trControl,
tuneGrid = tuneGrid
)
lasso$results
With the code above, caret results is a dataframe with 20 rows. Presumably one row for each point on the defined grid. However, I'd like caret to find one optimal lambda per grid search using 10-fold (number = 10) CV and then compare the optimal lambdas, as the bold process was performed multiple times (repeats = 5).