Just switched to mlr for my machine learning workflow. I am wondering if it is possible to tune hyperparameters using a separate validation set. From my minimum understanding, makeResampleDesc
and makeResampleInstance
accepts only resampling from training data.
My goal is to tune parameters with a validation set and test the final model with the test set. This is to prevent overfitting and knowledge leak.
Here is what I did code-wise:
## Create training, validation and test tasks
train_task <- makeClassifTask(data = train_data, target = "y", positive = 1)
validation_task <- makeClassifTask(data = validation_data, target = "y")
test_task <- makeClassifTask(data = test_data, target = "y")
## Attempt to tune parameters with separate validation data
tuned_params <- tuneParams(
task = train_task,
resampling = makeResampleInstance("Holdout", task = validation_task),
...
)
From the error message, it looks like evaluation is still trying to resample from the training set:
00001: Error in resample.fun(learner2, task, resampling, measures = measures, : Size of data set: 19454 and resampling instance: 1666333 differ!
Does anyone know what I should do? Am I setting up everything the right way?