1
votes

I'm using R mlr package because it allows me to use multiple classification methods and tune parameters, with the same methods in this package.

But it changed my Positive Class.

In my dataset, I need to predict "HasWriteOff", it has value "1" or "2". "1" is the majority class, much more than the number of "2", which means the class is imbalanced. I set the Positive class as "2" in makeClassifTask function, but after prediction, when I was checking confusion matrix, it shows Positive Class as "1".

Here is my code:

I set the positive class here

train_task <- makeClassifTask(data=data.frame(train_data), target = "HasWriteOff", positive = "2")
test_task <- makeClassifTask(data=data.frame(test_data), target = "HasWriteOff", positive = "2")

train and predict with XGBoost

set.seed(410)
getParamSet("classif.xgboost")
xg_learner <- makeLearner("classif.xgboost", predict.type = "response")
xg_learner$par.vals <- list(
  objective = "binary:logistic",
  eval_metric = "error",
  nrounds = 250
)
xg_param <- makeParamSet(
  makeIntegerParam("nrounds",lower=200,upper=600),
  makeIntegerParam("max_depth",lower=3,upper=20),
  makeNumericParam("lambda",lower=0.55,upper=0.60),
  makeNumericParam("eta", lower = 0.001, upper = 0.5),
  makeNumericParam("subsample", lower = 0.10, upper = 0.80),
  makeNumericParam("min_child_weight",lower=1,upper=5),
  makeNumericParam("colsample_bytree",lower = 0.2,upper = 0.8)
)
rancontrol <- makeTuneControlRandom(maxit = 100L)
cv_xg <- makeResampleDesc("CV",iters = 3L)
xg_tune <- tuneParams(learner = xg_learner, task = train_task, resampling = cv_xg,measures = acc,par.set = xg_param, control = rancontrol)
xg_final <- setHyperPars(learner = xg_learner, par.vals = xg_tune$x)
xgmodel <- mlr::train(xg_final, train_task)
xgpredict <- predict(xgmodel, test_task)

Check Confusion Matrix here

nb_prediction <- xgpredict$data$response
dCM <- confusionMatrix(test_data$HasWriteOff, nb_prediction)
dCM

Output

Accuracy : 0.9954
95% CI : (0.9916, 0.9978) No Information Rate : 0.9784
P-Value [Acc > NIR] : 5.136e-11
Kappa : 0.8913
Mcnemar's Test P-Value : 1
Sensitivity : 0.9977
Specificity : 0.8936
Pos Pred Value : 0.9977
Neg Pred Value : 0.8936
Prevalence : 0.9784
Detection Rate : 0.9761
Detection Prevalence : 0.9784
Balanced Accuracy : 0.9456
'Positive' Class : 1

As you can see here 'Positive' Class is 1.

I have checked other methods I'm using here, they don't have 'positive' parameter to set.

Do you know how can I really set positive class as the minority class "2"? I'm trying to see whether by setting the minority class as Positive Class, the Specificity can be higher?

1
This sounds like a bug. Could you provide a complete reproducible example please?Lars Kotthoff

1 Answers

0
votes

Oh, I just found that, this method should also change the positive class dCM <- confusionMatrix(test_data$HasWriteOff, nb_prediction, positive = "2")

Yesterday I didn't check confusionMatrix function because I thought the positive class should be defined by those methods used before predict.

However, just checked the R document, for confusionMatrix, parameter positive, it is saying:

If there are only two factor levels, the first level will be used as the "positive" result

So yesterday it simply chose the majority class no matter whether I defined positive class before.