In MLR
there is a method to implement the nested cross validation. In nested cv, the inner loop is used to select the best tuning parameters and the outer loop is used to evaluate the model performance. When I combine nested cv with the feature selection process, I'm a bit confounded about what will MLR
return about the inner bested tuned model. For example, I want to first apply a filter based on the correlation p value with outcome<0.05. In nested cv (I say it in training, validation and test set), it should be:
In the inner loop, for each training set, apply the filter, then tune the parameter we're interested and test in the validation set. In the inner loop, we can get the best tuning parameter and the feature set associated with it.
What I'm wondering is what the inner best tuned parameter will return for outer loop training, I assume there are two possible models:
The inner best tuned model just return the best tuned parameter, not the selected feature subset. So in the outer loop, we'll first apply the same filter, then train the training+validation set with the best tuned parameter.
The inner best tuned model return the best tuned parameter and the selected feature subset. So in the outer loop, we'll just train the training+validation set with the best tuned parameter and selected feature subset (from the inner loop).
In my opinion, I think the first one is more logic. Part of my code is as below:
svm_learner<-makeLearner("classif.svm",predict.type="prob",fix.factors.prediction = TRUE)
svm_filter<-makeFilterWrapper(learner = svm_learner,
fw.method = "t.test.filter", fw.threshold = -0.05)
svm_filter_nested<-makeTuneWrapper(svm_filter,par.set=ps,
control=ctrl,resampling=inner)
r=resample(svm_filter_nested,task,resampling=outer,models=TRUE)