4
votes

In mlr, it is possible to do filter feature selection together with hyperparameter tuning using nested cross validation, e.g. with the following code.

lrn = makeFilterWrapper(learner = "regr.kknn", fw.method = "chi.squared")
ps = makeParamSet(makeDiscreteParam("fw.abs", values = 10:13),
                  makeDiscreteParam("k", values = c(2, 3, 4)))
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("CV", iter = 2)
outer = makeResampleDesc("Subsample", iter = 3)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)
res = resample(lrn, bh.task, outer, mse, extract = getTuneResult)

But as far as I know, it is not possible to do something like this using wrapper feature selection, e.g.:

lrn = makeFeatSelWrapper(learner = "regr.kknn", ww.method = "random") # imaginary code
ps = makeParamSet(makeDiscreteParam("maxit", 15),
                  makeDiscreteParam("k", values = c(2, 3, 4))) # imaginary code, no method parameter & no resampling provided
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("CV", iter = 2)
outer = makeResampleDesc("Subsample", iter = 3)
lrn = makeTuneWrapper(lrn, resampling = inner, par.set = ps, control = ctrl, show.info = FALSE)
res = resample(lrn, bh.task, outer, mse, extract = getTuneResult)

Is there a way to achieve something like this? Especially, in order to avoid nested-nested cross validation? Is there any methodological reason why this would not be appropriate? Because actually, using filter feature selection with tuning parameter (number of features) looks quite similar to the wrapper approach, that is, your additional hyperparameter is actually a certain set of features, either derived from filter (e.g. "chi-squared") + threshold (top 90%, 80%, 70%) or an output from the wrapper algorithm (random, GA, Exhaustive, Sequential), and the best set of features is based on inner-cv performance in both cases.

I believe both approaches (nested with additional parameters for filtering and nested-nested) is similar with regard to computing complexity, but you might not want to reduce your training dataset further with nested-nested CV, and this would be achievable with the first approach.

Is this a methodological error I am making or is this a lack of (probably not really popular) feature?

2
bh.task is a regression task lm is also a regression method. You probably want to use mse as a measure for resampling.jakob-r
@jakobr ah, sorry, I just mindlessly copy-pasted a piece of code just to show the concept I am asking about. EditedMatek

2 Answers

2
votes

This feature is available in mlr since July. One needs to install the git version

devtools::install_github("mlr-org/mlr")

TuneWrapper needs to be in the inner resampling loop while FeatSelWrapper needs to be in the outer resampling loop. Here is an example using iris.task and rpart with backward selection:

library(mlr)

tuning parameters:

ps <- makeParamSet(
  makeNumericParam("cp", lower = 0.01, upper = 0.1),
  makeIntegerParam("minsplit", lower = 10, upper = 20)
)

grid search:

ctrl <- makeTuneControlGrid(resolution = 5L)

specify learner:

lrn <- makeLearner("classif.rpart", predict.type = "prob")

generate a tune wrapper:

lrn <- makeTuneWrapper(lrn, resampling = cv3, par.set = ps, control = makeTuneControlGrid(), show.info = FALSE)

generate a feature selection wrapper:

lrn = makeFeatSelWrapper(lrn,
                         resampling = cv3,
                         control = makeFeatSelControlSequential(method = "sbs"), show.info = FALSE)

perform resample:

res <- resample(lrn, task = iris.task,  resampling = cv3, show.info = TRUE, models = TRUE)

note that even this small example will take some time

res
#output
Resample Result
Task: iris_example
Learner: classif.rpart.tuned.featsel
Aggr perf: mmce.test.mean=0.1000000
Runtime: 92.1436

One can do the same thing without the outer most resample:

lrn <- makeLearner("classif.rpart", predict.type = "prob")
lrn <- makeTuneWrapper(lrn, resampling = cv3, par.set = ps, control = makeTuneControlGrid(), show.info = TRUE)
res2 <- selectFeatures(learner = lrn , task = iris.task, resampling = cv3,
                       control = makeFeatSelControlSequential(method = "sbs"), show.info = TRUE)
1
votes

If I got you right you are basically asking how to tune a FeatSelWrapper? This is a bit complex as Feature Selection (in mlr) depends on resampling because it is basically tuning. We don't tune learner parameters but we tune the selection of features to optimize a performance measure. To caluclate that measure we need resampling.

So what you propose in other words is to tune the "feature tuning" by choosing the best parameter for the feature tuning algorithm. This naturally brings another layer of nested resampling.

But it is debatable if this is necessary as the choice of Feature Selection usually depends on your available resources and other circumstances.

What you can do is to benchmark different feature selection methods:

inner = makeResampleDesc("CV", iter = 2)
outer = makeResampleDesc("Subsample", iter = 3)
settings = list(random1 = makeFeatSelControlRandom(maxit = 15), random2 =  makeFeatSelControlRandom(maxit = 20))
lrns = Map(function(x, xn) {
  lrn = makeFeatSelWrapper(learner = "regr.lm", control = x, resampling = inner)
  lrn$id = paste0(lrn$id, ".", xn)
  lrn
}, x = settings, xn = names(settings))
benchmark(lrns, bh.task, outer, list(mse, timeboth))