1
votes

I followed the documentation of mlr3 regarding the imputation of data with pipelines. However, the mode that I have trained does not allow predictions if a one column is NA

Do you have any idea why it doesn't work?

train step

library(mlr3)
library(mlr3learners)
library(mlr3pipelines)


data("mtcars", package = "datasets")
data = mtcars[, 1:3]
str(data)
task_mtcars = TaskRegr$new(id="cars", backend = data, target = "mpg")


imp_missind = po("missind")
imp_num     = po("imputehist", param_vals =list(affect_columns = selector_type("numeric")))
scale = po("scale")
learner = lrn('regr.ranger')

graph = po("copy", 2) %>>% 
  gunion(list(imp_num %>>% scale,imp_missind)) %>>%
  po("featureunion") %>>%
  po(learner)
graph$plot()

graphlearner = GraphLearner$new(graph)

predict step

data = task_mtcars$data()[12:12,]
data[1:1, cyl:=NA]
predict(graphlearner, data)

The error is

Error: Missing data in columns: cyl.
1

1 Answers

3
votes

The example in the mlr3gallery seems to work for your case, so you basically have to switch the order of imputehist and missind.

Another approach would be to set the missind's which hyperparameter to "all" in order to enforce the creation of an indicator for every column.

This is actually a bug, where missind returns the full task if trained on data with no missings (which in turn then overwrites the imputed values). Thanks a lot for spotting it. I am trying to fix it here PR