How do I set an optimal threshold for an XGBoost classifier ? The default value used in the algorithm is 0.5. I wanted to know if there is any feature/in-built function I can use to change this.
1 Answers
If using python: You are looking for predict_proba()
python API instead of usual predict()
API. With predict_proba()
you get probability which then can be mapped to any class depending on threshold value.
Since you mentioned spark mllib so you might be using scala or java with xgboost4j. In such cases also options exist; for example https://xgboost.readthedocs.io/en/latest/jvm/scaladocs/xgboost4j/ml/dmlc/xgboost4j/scala/Booster.html#predict(data:ml.dmlc.xgboost4j.scala.DMatrix,outPutMargin:Boolean,treeLimit:Int):Array[Array[Float]]
you are looking for outPutMargin
For deciding threshold you can use ROC curve or evaluate you business outcome with xgboost outcome e.g. if all cases below score 0.8 are are loss making then you can set threshold to 0.8