In mlr, it is possible to do filter feature selection together with hyperparameter tuning using nested cross validation, e.g. with the following code.
lrn = makeFilterWrapper(learner = "regr.kknn", fw.method = "chi.squared")
ps = makeParamSet(makeDiscreteParam("fw.abs", values = 10:13),
makeDiscreteParam("k", values = c(2, 3, 4)))
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("CV", iter = 2)
outer = makeResampleDesc("Subsample", iter = 3)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)
res = resample(lrn, bh.task, outer, mse, extract = getTuneResult)
But as far as I know, it is not possible to do something like this using wrapper feature selection, e.g.:
lrn = makeFeatSelWrapper(learner = "regr.kknn", ww.method = "random") # imaginary code
ps = makeParamSet(makeDiscreteParam("maxit", 15),
makeDiscreteParam("k", values = c(2, 3, 4))) # imaginary code, no method parameter & no resampling provided
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("CV", iter = 2)
outer = makeResampleDesc("Subsample", iter = 3)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)
res = resample(lrn, bh.task, outer, mse, extract = getTuneResult)
Is there a way to achieve something like this? Especially, in order to avoid nested-nested cross validation? Is there any methodological reason why this would not be appropriate? Because actually, using filter feature selection with tuning parameter (number of features) looks quite similar to the wrapper approach, that is, your additional hyperparameter is actually a certain set of features, either derived from filter (e.g. "chi-squared") + threshold (top 90%, 80%, 70%) or an output from the wrapper algorithm (random, GA, Exhaustive, Sequential), and the best set of features is based on inner-cv performance in both cases.
I believe both approaches (nested with additional parameters for filtering and nested-nested) is similar with regard to computing complexity, but you might not want to reduce your training dataset further with nested-nested CV, and this would be achievable with the first approach.
Is this a methodological error I am making or is this a lack of (probably not really popular) feature?
bh.task
is a regression tasklm
is also a regression method. You probably want to usemse
as a measure for resampling. – jakob-r