When using the scale_pos_weight parameter in xgboost, I don't know why this is happening?

Question

I have to solve the binary classification problem. (The ratio of train data size betweens label 0 and 1 is 4.7: 1) So, I created the model with the xgboost algorithm. Result is quite good.
- AUC: 0.989
- Precision(0/1): 0.998 / 0.938
- Recall(0/1): 0.992 / 0.986
- F Score(0/1): 0.995 / 0.962
But I want to increase the precision of label 1 (0.938). So, I tried to tune the parameters in xgboost. Particularly, I am curious about tuning the Scale_pos_weight parameter. Fisrt, I applied this value as recommended in the xgboost documentation. (num(negative) / num(positive)=4.7)
scale_pos_weight=4.7
- AUC: 0.973
- Precision(0/1): 0.999 / 0.807
- Recall(0/1): 0.971 / 0.994
- F Score(0/1): 0.985 / 0.891
The precision of label 1 has decreased and Recall has increased. On the contrary to this, I applied a reciprocal of 4.7
- AUC: 0.988
- Precision(0/1): 0.992 / 0.956
- Recall(0/1): 0.995 / 0.937
- F Score(0/1): 0.993 / 0.946
The result is what I wanted. I don't know why these results came out... Please explain this happening..

Solal Cohen Solal Cohen · Accepted Answer · 2020-02-13T10:17:35

I had the same situation. I think the reason is that when you have a scale_pos_weight of 4.7(as it should be), the model has the same number of label 0 and label 1 so it think that it need to predict the same number of label 0 and label1. But it doesn't succeed so it find a more high number of FP, that's why the precision of label 1 decrease.

When using the scale_pos_weight parameter in xgboost, I don't know why this is happening?

1 Answers