I am trying to integrate the MiniBatchKmeans function of package ClusterR to mlr. As per the docs, I have made the following changes:
- Created makeRLearner.cluster.MiniBatchKmeans
- Created trainLearner.cluster.MiniBatchKmeans
- Created predictLearner.cluster.MiniBatchKmeans
- Registered the above S3 methods (as described here)
At this point, I am able to create the learner, and call train and predict on them. However, the problem occurs when trying to create the learner without any value of "clusters" provided.
The underlying package (in ClusterR) does not have a default value defined for argument "clusters". As per the mlr approach, I have attempted to provide a default value of "clusters" using par.vals argument. However, this default argument is ignored.
My code:
#' @export
makeRLearner.cluster.MiniBatchKmeans = function() {
makeRLearnerCluster(
cl = "cluster.MiniBatchKmeans",
package = "ClusterR",
par.set = makeParamSet(
makeIntegerLearnerParam(id = "clusters", lower = 1L),
makeIntegerLearnerParam(id = "batch_size", default = 10L, lower = 1L),
makeIntegerLearnerParam(id = "num_init", default = 1L, lower = 1L),
makeIntegerLearnerParam(id = "max_iters", default = 100L, lower = 1L),
makeNumericLearnerParam(id = "init_fraction", default = 1, lower = 0),
makeDiscreteLearnerParam(id = "initializer", default = "kmeans++",
values = c("optimal_init", "quantile_init", "kmeans++", "random")),
makeIntegerLearnerParam(id = "early_stop_iter", default = 10L, lower = 1L),
makeLogicalLearnerParam(id = "verbose", default = FALSE,
tunable = FALSE),
makeUntypedLearnerParam(id = "CENTROIDS", default = NULL),
makeNumericLearnerParam(id = "tol", default = 1e-04, lower = 0),
makeNumericLearnerParam(id = "tol_optimal_init", default = 0.3, lower = 0),
makeIntegerLearnerParam(id = "seed", default = 1L)
),
par.vals = list(clusters = 2L),
properties = c("numerics", "prob"),
name = "MiniBatchKmeans",
note = "Note",
short.name = "MBatchKmeans",
callees = c("MiniBatchKmeans", "predict_MBatchKMeans")
)
}
#' @export
trainLearner.cluster.MiniBatchKmeans = function(.learner, .task, .subset, .weights = NULL, ...) {
ClusterR::MiniBatchKmeans(getTaskData(.task, .subset), ...)
}
#' @export
predictLearner.cluster.MiniBatchKmeans = function(.learner, .model, .newdata, ...) {
if (.learner$predict.type == "prob") {
pred = ClusterR::predict_MBatchKMeans(data = .newdata,
CENTROIDS = .model$learner.model$centroids,
fuzzy = TRUE, ...)
res = pred$fuzzy_clusters
return(res)
} else {
pred = ClusterR::predict_MBatchKMeans(data = .newdata,
CENTROIDS = .model$learner.model$centroids,
fuzzy = FALSE, ...)
res = as.integer(pred)
return(res)
}
}
The problem (default value of clusters in par.vals above is ignored):
## When defining a value of clusters, it works as expected
lrn <- makeLearner("cluster.MiniBatchKmeans", clusters = 3L)
getLearnerParVals(lrn)
# The below commented lines are printed
# $clusters
# [1] 3
## When not providing a value for clusters, default is not used
lrn <- makeLearner("cluster.MiniBatchKmeans")
getLearnerParVals(lrn)
# The below commented lines are printed
# named list()
Any advice on why I am seeing this behavior? I checked other learner's (like cluster.kmeans, cluster.kkmeans etc) code and I see that they are able to successfully define default values in the same format that I have done. Additionally, here is documentation that this is the right way to go.
Here is my code on github, in case it's helpful for reproducing the problem. There is an added test file (in tests/testthat), but that has issues of its own.
Edit 1 - Actual Error Message Here is the actual error message that I see when trying to train a learner without explicitly providing default value of "clusters":
lrn <- makeLearner("cluster.MiniBatchKmeans")
train(lrn, cluster_task)
Error in ClusterR::MiniBatchKmeans(getTaskData(.task, .subset), ...) :
argument "clusters" is missing, with no default
10.
ClusterR::MiniBatchKmeans(getTaskData(.task, .subset), ...) at RLearner_cluster_MiniBatchKmeans.R#32
9.
trainLearner.cluster.MiniBatchKmeans(.learner = structure(list(
id = "cluster.MiniBatchKmeans", type = "cluster", package = "ClusterR",
properties = c("numerics", "prob"), par.set = structure(list(
pars = list(clusters = structure(list(id = "clusters", ... at trainLearner.R#24
8.
(function (.learner, .task, .subset, .weights = NULL, ...)
{
UseMethod("trainLearner")
})(.learner = structure(list(id = "cluster.MiniBatchKmeans", ...
7.
do.call(trainLearner, pars) at train.R#96
6.
fun3(do.call(trainLearner, pars)) at train.R#96
5.
fun2(fun3(do.call(trainLearner, pars))) at train.R#96
4.
fun1({
learner.model = fun2(fun3(do.call(trainLearner, pars)))
}) at train.R#96
3.
force(expr) at helpers.R#93
2.
measureTime(fun1({
learner.model = fun2(fun3(do.call(trainLearner, pars)))
})) at train.R#96
1.
train(lrn, cluster_task)