I've been exploring the marvelous mlr
package with the titanic data set. My problem is implementing a random forest. More specifically, I'd like to tune the cutoff
(i.e. the threshold which assigns leafs which are not pure to a given class). The problem is that the cutoff
argument takes two values, however, I can only figure out hyperparameter turning in mlr
for a single value .
The code:
library(mlr)
library(dplyr)
dTrain <- read.csv('path/to/data/')
#Defining the Task
trainTask <- makeClassifTask(data = dTrain %>%
select(-Name, -Ticket, -Cabin) %>%
filter(complete.cases(.)),
target = "Survived",
id = "PassengerId")
#Defining Learning
rfLRN <- makeLearner("classif.randomForest")
#Defining the Parameter Space
ps <- makeParamSet(
makeDiscreteParam("cutoff", values = list(c(.5,.5), c(.75,.25)))
)
This is were the problem lies, cutoff
needs two values, however, I'm not sure how to pass these two values. The above attempt is wrong. I've attempted several other parameter makers, i.e. makeDiscreteVectorParam
, etc.... but to no avail. Any tips?
If instead I attempt to tune a parameter like mtry
(i.e the number of features to select from at a given split) everything works fine.
#Defining the Hyperparameter Space
ps = makeParamSet(
makeDiscreteParam("mtry", values = c(2,3,4,5))
)
#Defining Resampling
cvTask <- makeResampleDesc("CV", iters=5L)
#Defining Search
search <- makeTuneControlGrid()
#Tune!
tune <- tuneParams(learner = rfLRN
,task = trainTask
,resampling = cvTask
,measures = list(acc)
,par.set = ps
,control = search
,show.info = TRUE)
makeNumericParam("cutoff", lower = .2, upper = .8, trafo = function(x) c(x, 1-x))
instead ofmakeDiscreteParam("cutoff", values = list(a=c(.50,.50), b=c(.75,.25))
. Much less coding to get an exhaustive search. – Jacob H