R mlr - Wrapper feature selection + hyperparameter tuning without nested-nested cross validation?

Question

In mlr, it is possible to do filter feature selection together with hyperparameter tuning using nested cross validation, e.g. with the following code.

lrn = makeFilterWrapper(learner = "regr.kknn", fw.method = "chi.squared")
ps = makeParamSet(makeDiscreteParam("fw.abs", values = 10:13),
                  makeDiscreteParam("k", values = c(2, 3, 4)))
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("CV", iter = 2)
outer = makeResampleDesc("Subsample", iter = 3)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)
res = resample(lrn, bh.task, outer, mse, extract = getTuneResult)

But as far as I know, it is not possible to do something like this using wrapper feature selection, e.g.:

lrn = makeFeatSelWrapper(learner = "regr.kknn", ww.method = "random") # imaginary code
ps = makeParamSet(makeDiscreteParam("maxit", 15),
                  makeDiscreteParam("k", values = c(2, 3, 4))) # imaginary code, no method parameter & no resampling provided
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("CV", iter = 2)
outer = makeResampleDesc("Subsample", iter = 3)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)
res = resample(lrn, bh.task, outer, mse, extract = getTuneResult)

Is there a way to achieve something like this? Especially, in order to avoid nested-nested cross validation? Is there any methodological reason why this would not be appropriate? Because actually, using filter feature selection with tuning parameter (number of features) looks quite similar to the wrapper approach, that is, your additional hyperparameter is actually a certain set of features, either derived from filter (e.g. "chi-squared") + threshold (top 90%, 80%, 70%) or an output from the wrapper algorithm (random, GA, Exhaustive, Sequential), and the best set of features is based on inner-cv performance in both cases.

I believe both approaches (nested with additional parameters for filtering and nested-nested) is similar with regard to computing complexity, but you might not want to reduce your training dataset further with nested-nested CV, and this would be achievable with the first approach.

Is this a methodological error I am making or is this a lack of (probably not really popular) feature?

bh.task is a regression task lm is also a regression method. You probably want to use mse as a measure for resampling. — jakob-r
@jakobr ah, sorry, I just mindlessly copy-pasted a piece of code just to show the concept I am asking about. Edited — Matek

missuse missuse · Accepted Answer · 2018-01-08T20:29:47

This feature is available in mlr since July. One needs to install the git version

devtools::install_github("mlr-org/mlr")

TuneWrapper needs to be in the inner resampling loop while FeatSelWrapper needs to be in the outer resampling loop. Here is an example using iris.task and rpart with backward selection:

library(mlr)

tuning parameters:

ps <- makeParamSet(
  makeNumericParam("cp", lower = 0.01, upper = 0.1),
  makeIntegerParam("minsplit", lower = 10, upper = 20)
)

grid search:

ctrl <- makeTuneControlGrid(resolution = 5L)

specify learner:

lrn <- makeLearner("classif.rpart", predict.type = "prob")

generate a tune wrapper:

lrn <- makeTuneWrapper(lrn, resampling = cv3, par.set = ps, control = makeTuneControlGrid(), show.info = FALSE)

generate a feature selection wrapper:

lrn = makeFeatSelWrapper(lrn,
                         resampling = cv3,
                         control = makeFeatSelControlSequential(method = "sbs"), show.info = FALSE)

perform resample:

res <- resample(lrn, task = iris.task,  resampling = cv3, show.info = TRUE, models = TRUE)

note that even this small example will take some time

res
#output
Resample Result
Task: iris_example
Learner: classif.rpart.tuned.featsel
Aggr perf: mmce.test.mean=0.1000000
Runtime: 92.1436

One can do the same thing without the outer most resample:

lrn <- makeLearner("classif.rpart", predict.type = "prob")
lrn <- makeTuneWrapper(lrn, resampling = cv3, par.set = ps, control = makeTuneControlGrid(), show.info = TRUE)
res2 <- selectFeatures(learner = lrn , task = iris.task, resampling = cv3,
                       control = makeFeatSelControlSequential(method = "sbs"), show.info = TRUE)

R mlr - Wrapper feature selection + hyperparameter tuning without nested-nested cross validation?

2 Answers