CV or train/predict in mlr3

Question

In a post "The "Cross-Validation - Train/Predict" misunderstanding" by Patrick Schratz

mentioned that:

(a) CV is done to get an estimate of a model’s performance.

(b) Train/predict is done to create the final predictions (which your boss might use to make some decisions on).

It means in mlr3, if we are in academia, need to publish papers, we need to use the CV as we intend to compare the performance of different algorithms. And in industry, if our plan is to train a model and then have to use again and again on industry data to make predictions, we need to use the train/predict methods provided by mlr3 ?

Is it something which I completely picked wrong?

Thank you

pat-s pat-s · Accepted Answer · 2021-02-06T21:07:02

You always need a CV if you want to make a statement about a model's performance.

If you want to use the model to make predictions to unknown data, do a single fit and then predict.

So in practice, you need both: CV + "train+predict".

PS: Your post does not really fit to Stackoverflow since it is not related to a coding problem. For statistical questions please see https://stats.stackexchange.com/.

PS2: If you talk about a post, please include the link. I am the author of the post in this case but most other people might not know what you are talking about ;)

CV or train/predict in mlr3

1 Answers