1
votes

Lately I have been advised to change machine learning framework to mlr3. But I am finding transition somewhat more difficult than I thought at the beginning. In my current project I am dealing with highly imbalanced data which I would like to balance before training my model. I have found out this tutorial which explains how to deal with imbalance via pipelines and graph learner:

https://mlr3gallery.mlr-org.com/posts/2020-03-30-imbalanced-data/

I am afraid that this approach will also perform class balancing with new data predicting. Why would I want to do this and reduce my testing sample ?

So the two question that are rising:

  1. Am I correct not to balance classes in testing data?
  2. If so, is there a way of doing this in mlr3?

Of course I could just subset the training data manually and deal with imbalance myself but that's just not fun anymore! :)

Anyway, thanks for any answers,
Cheers!

1

1 Answers

3
votes

to answer your questions:

I am afraid that this approach will also perform class balancing with new data predicting.

This is not correct, where did you get this?

Am I correct not to balance classes in testing data?

Class balancing usually works by adding or removing rows (or adjusting weights). All those steps should not be applied during the prediction step, as we want exactly one predicted value for each row in the data. Weights on the other hand usually have no effect during the prediction phase. Your assumption is correct.

If so, is there a way of doing this in mlr3?

Just use the PipeOpas described in the blog post. During training, it will do the specified over- or under- sampling, while it does nothing during the prediction.

Cheers,