1
votes

I want to tune hyperparameters for random forest using the MLR package. I have a few questions:

1) How do I decide which of the parameters I should tune? I heard something about keeping num.trees as high as computationally possible and tuning mtry? (I couldn't find anything online backing this up though)

2) What should be my range of tuning mtry? Is here a good rule of thumb between 0 and 1/3 of the parameter? If so, how would I integrate that in the code below if I have different data sets (i.e., what would I write instead of lower=0 and upper =10)?

3) Lastly, does it even make sense to create the learner twice, once with makeLearner function where I set the parameter in par.vals and then once with makeTuneWrapper function? Doesn’t it overwrite it then anyways?

learnerRF = makeLearner("regr.ranger", par.vals = list("num.trees" = 5000)) 
parsRF = makeParamSet(
  makeIntegerParam("mtry", lower = 0 , upper = 10), 
)
tuneRF = makeTuneControlGrid() 
inner = makeResampleDesc("CV", iters = 10)
learnerRF = makeTuneWrapper(learnerRF, resampling = inner, par.set = parsRF,control = tuneRF, show.info = FALSE) 
2

2 Answers

3
votes

You can look into these two papers, which try to answer your questions:

http://jmlr.org/papers/v18/17-269.html

https://arxiv.org/abs/1804.03515

tuneRanger is a package especially for tuning random forest in R.

1
votes

The answer to 1 and 2 is the same -- as much as you can computationally afford, i.e. make the number of parameters and their ranges as large as possible. This will provide the largest possible gain by considering the largest number of configuration options.

Regarding 3, you don't have to create a separate learner before calling makeTuneWrapper() (and it doesn't make any sense to set parameters there that you're later tuning). You can do both in one step like this:

learnerRF = makeTuneWrapper("regr.ranger", resampling = inner, par.set = parsRF, 
                            control = tuneRF, show.info = FALSE)