Lately I have been advised to change machine learning framework to mlr3. But I am finding transition somewhat more difficult than I thought at the beginning. In my current project I am dealing with highly imbalanced data which I would like to balance before training my model. I have found out this tutorial which explains how to deal with imbalance via pipelines and graph learner:
https://mlr3gallery.mlr-org.com/posts/2020-03-30-imbalanced-data/
I am afraid that this approach will also perform class balancing with new data predicting. Why would I want to do this and reduce my testing sample ?
So the two question that are rising:
- Am I correct not to balance classes in testing data?
- If so, is there a way of doing this in mlr3?
Of course I could just subset the training data manually and deal with imbalance myself but that's just not fun anymore! :)
Anyway, thanks for any answers,
Cheers!