I think that I've found the answer to what I'm looking for. In brief, what I'd like to do is:
Create a graph pipeline with multiple learners. I'd like some of the learners to be inserted with fixed hyperparameters, while for others I'd like to have their hyperparameters tuned. Then, I'd like to benchmark them and select the 'best' one. I'd also like the benchmarking of learners to happen under different class balancing strategies, namely, do nothing, up-sample and down-sample. The optimal parameter settings for the up/down-sampling (e.g. ratio) would also be determined during tuning.
Two examples below, one that almost does what I want, the other doing exactly what I want.
Example 1: Build a pipe that includes all learners, that is, learners with fixed hyperparameters, as well as learners whose hyperparameters require tuning
As will be shown, it seems like a bad idea to have both kinds of learners (i.e. with fixed and tunable hyperparameters), because tuning the pipe disregards the learners with tunable hyperparameters.
####################################################################################
# Build Machine Learning pipeline that:
# 1. Imputes missing values (optional).
# 2. Tunes and benchmarks a range of learners.
# 3. Handles imbalanced data in different ways.
# 4. Identifies optimal learner for the task at hand.
# Abbreviations
# 1. td: Tuned. Learner already tuned with optimal hyperparameters, as found empirically by Probst et al. (2009). See http://jmlr.csail.mit.edu/papers/volume20/18-444/18-444.pdf
# 2. tn: Tuner. Optimal hyperparameters for the learner to be determined within the Tuner.
# 3. raw: Raw dataset in that class imbalances were not treated in any way.
# 4. up: Data upsampling to balance class imbalances.
# 5. down: Data downsampling to balance class imbalances.
# References
# Probst et al. (2009). http://jmlr.csail.mit.edu/papers/volume20/18-444/18-444.pdf
####################################################################################
task <- tsk('sonar')
# Indices for splitting data into training and test sets
train.idx <- task$data() %>%
select(Class) %>%
rownames_to_column %>%
group_by(Class) %>%
sample_frac(2 / 3) %>% # Stratified sample to maintain proportions between classes.
ungroup %>%
select(rowname) %>%
deframe %>%
as.numeric
test.idx <- setdiff(seq_len(task$nrow), train.idx)
# Define training and test sets in task format
task_train <- task$clone()$filter(train.idx)
task_test <- task$clone()$filter(test.idx)
# Define class balancing strategies
class_counts <- table(task_train$truth())
upsample_ratio <- class_counts[class_counts == max(class_counts)] /
class_counts[class_counts == min(class_counts)]
downsample_ratio <- 1 / upsample_ratio
# 1. Enrich minority class by factor 'ratio'
po_over <- po("classbalancing", id = "up", adjust = "minor",
reference = "minor", shuffle = FALSE, ratio = upsample_ratio)
# 2. Reduce majority class by factor '1/ratio'
po_under <- po("classbalancing", id = "down", adjust = "major",
reference = "major", shuffle = FALSE, ratio = downsample_ratio)
# 3. No class balancing
po_raw <- po("nop", id = "raw") # Pipe operator for 'do nothing' ('nop'), i.e. don't up/down-balance the classes.
# We will be using an XGBoost learner throughout with different hyperparameter settings.
# Define XGBoost learner with the optimal hyperparameters of Probst et al.
# Learner will be added to the pipeline later on, in conjuction with and without class balancing.
xgb_td <- lrn("classif.xgboost", predict_type = 'prob')
xgb_td$param_set$values <- list(
booster = "gbtree",
nrounds = 2563,
max_depth = 11,
min_child_weight = 1.75,
subsample = 0.873,
eta = 0.052,
colsample_bytree = 0.713,
colsample_bylevel = 0.638,
lambda = 0.101,
alpha = 0.894
)
xgb_td_raw <- GraphLearner$new(
po_raw %>>%
po('learner', xgb_td, id = 'xgb_td'),
predict_type = 'prob'
)
xgb_tn_raw <- GraphLearner$new(
po_raw %>>%
po('learner', lrn("classif.xgboost",
predict_type = 'prob'), id = 'xgb_tn'),
predict_type = 'prob'
)
xgb_td_up <- GraphLearner$new(
po_over %>>%
po('learner', xgb_td, id = 'xgb_td'),
predict_type = 'prob'
)
xgb_tn_up <- GraphLearner$new(
po_over %>>%
po('learner', lrn("classif.xgboost",
predict_type = 'prob'), id = 'xgb_tn'),
predict_type = 'prob'
)
xgb_td_down <- GraphLearner$new(
po_under %>>%
po('learner', xgb_td, id = 'xgb_td'),
predict_type = 'prob'
)
xgb_tn_down <- GraphLearner$new(
po_under %>>%
po('learner', lrn("classif.xgboost",
predict_type = 'prob'), id = 'xgb_tn'),
predict_type = 'prob'
)
learners_all <- list(
xgb_td_raw,
xgb_tn_raw,
xgb_td_up,
xgb_tn_up,
xgb_td_down,
xgb_tn_down
)
names(learners_all) <- sapply(learners_all, function(x) x$id)
# Create pipeline as a graph. This way, pipeline can be plotted. Pipeline can then be converted into a learner with GraphLearner$new(pipeline).
# Pipeline is a collection of Graph Learners (type ?GraphLearner in the command line for info).
# Each GraphLearner is a td or tn model (see abbreviations above) with or without class balancing.
# Up/down or no sampling happens within each GraphLearner, otherwise an error during tuning indicates that there are >= 2 data sources.
# Up/down or no sampling within each GraphLearner can be specified by chaining the relevant pipe operators (function po(); type ?PipeOp in command line) with the PipeOp of each learner.
graph <-
#po("imputehist") %>>% # Optional. Impute missing values only when using classifiers that can't handle them (e.g. Random Forest).
po("branch", names(learners_all)) %>>%
gunion(unname(learners_all)) %>>%
po("unbranch")
graph$plot() # Plot pipeline
pipe <- GraphLearner$new(graph) # Convert pipeline to learner
pipe$predict_type <- 'prob' # Don't forget to specify we want to predict probabilities and not classes.
ps_table <- as.data.table(pipe$param_set)
View(ps_table[, 1:4])
# Set hyperparameter ranges for the tunable learners
ps_xgboost <- ps_table$id %>%
lapply(
function(x) {
if (grepl('_tn', x)) {
if (grepl('.booster', x)) {
ParamFct$new(x, levels = "gbtree")
} else if (grepl('.nrounds', x)) {
ParamInt$new(x, lower = 100, upper = 110)
} else if (grepl('.max_depth', x)) {
ParamInt$new(x, lower = 3, upper = 10)
} else if (grepl('.min_child_weight', x)) {
ParamDbl$new(x, lower = 0, upper = 10)
} else if (grepl('.subsample', x)) {
ParamDbl$new(x, lower = 0, upper = 1)
} else if (grepl('.eta', x)) {
ParamDbl$new(x, lower = 0.1, upper = 0.6)
} else if (grepl('.colsample_bytree', x)) {
ParamDbl$new(x, lower = 0.5, upper = 1)
} else if (grepl('.gamma', x)) {
ParamDbl$new(x, lower = 0, upper = 5)
}
}
}
)
ps_xgboost <- Filter(Negate(is.null), ps_xgboost)
ps_xgboost <- ParamSet$new(ps_xgboost)
# Se parameter ranges for the class balancing strategies
ps_class_balancing <- ps_table$id %>%
lapply(
function(x) {
if (all(grepl('up.', x), grepl('.ratio', x))) {
ParamDbl$new(x, lower = 1, upper = upsample_ratio)
} else if (all(grepl('down.', x), grepl('.ratio', x))) {
ParamDbl$new(x, lower = downsample_ratio, upper = 1)
}
}
)
ps_class_balancing <- Filter(Negate(is.null), ps_class_balancing)
ps_class_balancing <- ParamSet$new(ps_class_balancing)
# Define parameter set
param_set <- ParamSetCollection$new(list(
ParamSet$new(list(pipe$param_set$params$branch.selection$clone())), # ParamFct can be copied.
ps_xgboost,
ps_class_balancing
))
# Add dependencies. For instance, we can only set the mtry value if the pipe is configured to use the Random Forest (ranger).
# In a similar manner, we want do add a dependency between, e.g. hyperparameter "raw.xgb_td.xgb_tn.booster" and branch "raw.xgb_td"
# See https://mlr3gallery.mlr-org.com/tuning-over-multiple-learners/
param_set$ids()[-1] %>%
lapply(
function(x) {
aux <- names(learners_all) %>%
sapply(
function(y) {
grepl(y, x)
}
)
aux <- names(aux[aux])
param_set$add_dep(x, "branch.selection",
CondEqual$new(aux))
}
)
# Set up tuning instance
instance <- TuningInstance$new(
task = task_train,
learner = pipe,
resampling = rsmp('cv', folds = 2),
measures = msr("classif.bbrier"),
#measures = prc_micro,
param_set,
terminator = term("evals", n_evals = 3))
tuner <- TunerRandomSearch$new()
# Tune pipe learner to find best-performing branch
tuner$tune(instance)
instance$result
instance$archive()
instance$archive(unnest = "tune_x") # Unnest the tuner search space values
pipe$param_set$values <- instance$result$params
pipe$train(task_train)
pred <- pipe$predict(task_test)
pred$confusion
Note that the tuner chooses to disregard the tuning of the tunable learners and focuses on the tuned learners only. This can be confirmed by inspecting instance$result
: the only things that have been tuned for the tunable learners are the class-balancing parameters, which are actually not learner hyperparameters.
Example 2: Build a pipe that includes tunable learners only, find the 'best' one, and then benchmark it against the learners with fixed hyperparameters at a second stage.
Step 1: Build pipe for tunable learners
learners_all <- list(
#xgb_td_raw,
xgb_tn_raw,
#xgb_td_up,
xgb_tn_up,
#xgb_td_down,
xgb_tn_down
)
names(learners_all) <- sapply(learners_all, function(x) x$id)
# Create pipeline as a graph. This way, pipeline can be plotted. Pipeline can then be converted into a learner with GraphLearner$new(pipeline).
# Pipeline is a collection of Graph Learners (type ?GraphLearner in the command line for info).
# Each GraphLearner is a td or tn model (see abbreviations above) with or without class balancing.
# Up/down or no sampling happens within each GraphLearner, otherwise an error during tuning indicates that there are >= 2 data sources.
# Up/down or no sampling within each GraphLearner can be specified by chaining the relevant pipe operators (function po(); type ?PipeOp in command line) with the PipeOp of each learner.
graph <-
#po("imputehist") %>>% # Optional. Impute missing values only when using classifiers that can't handle them (e.g. Random Forest).
po("branch", names(learners_all)) %>>%
gunion(unname(learners_all)) %>>%
po("unbranch")
graph$plot() # Plot pipeline
pipe <- GraphLearner$new(graph) # Convert pipeline to learner
pipe$predict_type <- 'prob' # Don't forget to specify we want to predict probabilities and not classes.
ps_table <- as.data.table(pipe$param_set)
View(ps_table[, 1:4])
ps_xgboost <- ps_table$id %>%
lapply(
function(x) {
if (grepl('_tn', x)) {
if (grepl('.booster', x)) {
ParamFct$new(x, levels = "gbtree")
} else if (grepl('.nrounds', x)) {
ParamInt$new(x, lower = 100, upper = 110)
} else if (grepl('.max_depth', x)) {
ParamInt$new(x, lower = 3, upper = 10)
} else if (grepl('.min_child_weight', x)) {
ParamDbl$new(x, lower = 0, upper = 10)
} else if (grepl('.subsample', x)) {
ParamDbl$new(x, lower = 0, upper = 1)
} else if (grepl('.eta', x)) {
ParamDbl$new(x, lower = 0.1, upper = 0.6)
} else if (grepl('.colsample_bytree', x)) {
ParamDbl$new(x, lower = 0.5, upper = 1)
} else if (grepl('.gamma', x)) {
ParamDbl$new(x, lower = 0, upper = 5)
}
}
}
)
ps_xgboost <- Filter(Negate(is.null), ps_xgboost)
ps_xgboost <- ParamSet$new(ps_xgboost)
ps_class_balancing <- ps_table$id %>%
lapply(
function(x) {
if (all(grepl('up.', x), grepl('.ratio', x))) {
ParamDbl$new(x, lower = 1, upper = upsample_ratio)
} else if (all(grepl('down.', x), grepl('.ratio', x))) {
ParamDbl$new(x, lower = downsample_ratio, upper = 1)
}
}
)
ps_class_balancing <- Filter(Negate(is.null), ps_class_balancing)
ps_class_balancing <- ParamSet$new(ps_class_balancing)
param_set <- ParamSetCollection$new(list(
ParamSet$new(list(pipe$param_set$params$branch.selection$clone())), # ParamFct can be copied.
ps_xgboost,
ps_class_balancing
))
# Add dependencies. For instance, we can only set the mtry value if the pipe is configured to use the Random Forest (ranger).
# In a similar manner, we want do add a dependency between, e.g. hyperparameter "raw.xgb_td.xgb_tn.booster" and branch "raw.xgb_td"
# See https://mlr3gallery.mlr-org.com/tuning-over-multiple-learners/
param_set$ids()[-1] %>%
lapply(
function(x) {
aux <- names(learners_all) %>%
sapply(
function(y) {
grepl(y, x)
}
)
aux <- names(aux[aux])
param_set$add_dep(x, "branch.selection",
CondEqual$new(aux))
}
)
# Set up tuning instance
instance <- TuningInstance$new(
task = task_train,
learner = pipe,
resampling = rsmp('cv', folds = 2),
measures = msr("classif.bbrier"),
#measures = prc_micro,
param_set,
terminator = term("evals", n_evals = 3))
tuner <- TunerRandomSearch$new()
# Tune pipe learner to find best-performing branch
tuner$tune(instance)
instance$result
instance$archive()
instance$archive(unnest = "tune_x") # Unnest the tuner search space values
pipe$param_set$values <- instance$result$params
pipe$train(task_train)
pred <- pipe$predict(task_test)
pred$confusion
Note that now instance$result
returns optimal results for the learners' hyperparameters too, and not just for the class-balancing parameters.
Step 2: Benchmark 'best' tunable learner (now tuned) and the learners that have fixed hyperparameters
# Define re-sampling and instantiate it so always the same split will be used
resampling <- rsmp("cv", folds = 2)
set.seed(123)
resampling$instantiate(task_train)
bmr <- benchmark(
design = benchmark_grid(
task_train,
learner = list(pipe, xgb_td_raw, xgb_td_up, xgb_tn_down),
resampling
),
store_models = TRUE # Only needed if you want to inspect the models
)
bmr$aggregate(msr("classif.bbrier"))
A few issues to consider
- I should have probably created a second, separate pipe for the
learners that have fixed hyperparameters, in order to at least have
the class-balancing parameters tuned. Then, the two pipes (tunable
and fixed hyperparameters) would be benchmarked with
benchmark()
.
- I should have probably used the same resampling strategy from beginning to end? I.e., instantiate the reampling strategy right
before tuning the first pipe, so that this strategy is also used in
the second pipe and in the final benchmark.
Comments/validation more than welcome.
(special thanks to missuse for the constructive comments)