0
votes

I run the code below. If I deactivate instantiation (as shown), the results of my benchmark comparison will be different for the three benchmark experiments and the conclusion which learner performs better may be different.

How can I adress this issue? One way may be to average over a large number of resamplings. I could write code for this but maybe this is an option already when calling "benchmark"?

resampling = rsmp("cv", folds=20) 
#resampling$instantiate(task)   # results below will (and shall) differ, if instantiation is not performed here

design = benchmark_grid(
  tasks = task,
  learners = list(glrn_knn_pca, glrn_knn_nopca),
  resamplings = resampling
)

design2 = benchmark_grid(
  tasks = task,
  learners = list(glrn_knn_pca, glrn_knn_nopca),
  resamplings = resampling
)


design3 = benchmark_grid(
  tasks = task,
  learners = list(glrn_knn_pca, glrn_knn_nopca),
  resamplings = resampling
)


bmr = benchmark(design)
bmr2 = benchmark(design2)
bmr3 = benchmark(design3)

bmr$aggregate(msr("classif.auc"))   
bmr2$aggregate(msr("classif.auc"))   
bmr3$aggregate(msr("classif.auc")) 
2
How large is the difference? If it's very large, you might need a different way of evaluating. For example leave-one-out CV should always give you the same results, but will be very expensive.Lars Kotthoff
Have you tried setting a seed?pat-s
@pat-s : Yes, I tried. My problem is that setting a see would cover the problem that the decision which algorithm performs better depend strongly on the seed. So I need to prevent this from happening.ds_col

2 Answers

2
votes

It looks to me that you may want to use repeated CV to minimize variability introduced by partitioning.

Instead of resampling = rsmp("cv", folds = 20) you could use resampling = rsmp("repeated_cv", folds = 20, repeats = 100) and create 100 different resampling scenarios and benchmark all your learners across these.

This is a common approach in ML to reduce the impact of a single partitioning.

1
votes

If you want to find out which learner performs better, it is not sufficient to just compare the aggregated performance measures. Statistical tests and plots for benchmarks are implemented in the mlr3benchmark package.