2
votes

I've been exploring the marvelous mlr package with the titanic data set. My problem is implementing a random forest. More specifically, I'd like to tune the cutoff (i.e. the threshold which assigns leafs which are not pure to a given class). The problem is that the cutoff argument takes two values, however, I can only figure out hyperparameter turning in mlr for a single value .

The code:

library(mlr)
library(dplyr)

dTrain <- read.csv('path/to/data/')

#Defining the Task
trainTask <- makeClassifTask(data = dTrain %>% 
                           select(-Name, -Ticket, -Cabin) %>% 
                           filter(complete.cases(.)), 
                         target = "Survived", 
                         id = "PassengerId")

#Defining Learning
rfLRN <- makeLearner("classif.randomForest")

#Defining the Parameter Space
ps <- makeParamSet(
 makeDiscreteParam("cutoff", values = list(c(.5,.5), c(.75,.25)))
)

This is were the problem lies, cutoff needs two values, however, I'm not sure how to pass these two values. The above attempt is wrong. I've attempted several other parameter makers, i.e. makeDiscreteVectorParam, etc.... but to no avail. Any tips?

If instead I attempt to tune a parameter like mtry (i.e the number of features to select from at a given split) everything works fine.

#Defining the Hyperparameter Space
ps = makeParamSet(
  makeDiscreteParam("mtry", values = c(2,3,4,5))
)

#Defining Resampling
cvTask <- makeResampleDesc("CV", iters=5L)

#Defining Search
search <-  makeTuneControlGrid()

#Tune!
tune <- tuneParams(learner = rfLRN
                 ,task = trainTask
                 ,resampling = cvTask
                 ,measures = list(acc)
                 ,par.set = ps
                 ,control = search
                 ,show.info = TRUE)
1
For those with a similar problem, a better approach is to use makeNumericParam("cutoff", lower = .2, upper = .8, trafo = function(x) c(x, 1-x)) instead of makeDiscreteParam("cutoff", values = list(a=c(.50,.50), b=c(.75,.25)). Much less coding to get an exhaustive search.Jacob H

1 Answers

2
votes

Looks like you need to assign names to these classification cutoffs, e.g.:

#Defining the Parameter Space
ps <- makeParamSet(
  makeDiscreteParam("cutoff", values = list(
    a=c(.50,.50),
    b=c(.75,.25)))
)

Output:

> tune <- tuneParams(learner = rfLRN
+                    ,task = trainTask
+                    ,resampling = cvTask
+                    ,measures = list(acc)
+                    ,par.set = ps
+                    ,control = search
+                    ,show.info = TRUE)
[Tune] Started tuning learner classif.randomForest for parameter set:
           Type len Def Constr Req Tunable Trafo
cutoff discrete   -   -    a,b   -    TRUE     -
With control class: TuneControlGrid
Imputation value: -0
[Tune-x] 1: cutoff=a
[Tune-y] 1: acc.test.mean=0.828; time: 0.0 min
[Tune-x] 2: cutoff=b
[Tune-y] 2: acc.test.mean=0.776; time: 0.0 min
[Tune] Result: cutoff=a : acc.test.mean=0.828