I am trying to use XGBoost for classification. I am pretty doubtful on its accuracy. I have applied it with default parameters and the precision is 100%.
xg_cl_default = xgb.XGBClassifier()
xg_cl_default.fit(trainX, trainY)
preds = xg_cl_default.predict(testX)
precision_score(testY,preds)
# 1.0
However my data is imbalance so I use scale_pos_weight parameter along with few other parameters given below:
ratio = int(df_final.filter(col('isFraud')==0).count()/df_final.filter(col('isFraud')==1).count())
xg_cl = xgb.XGBClassifier(scale_pos_weight = ratio, n_estimators=50)
eval_set = [(valX, valY.values.ravel())]
xg_cl.fit(trainX, trainY.values.ravel(),eval_metric="error", early_stopping_rounds=10,eval_set=eval_set, verbose=True)
preds = xg_cl_default.predict(testX)
precision_score(testY,preds)
# 1.0
In both the cases my precision is 100% and Recall is 99%. This is not acceptable for me as data is highly imbalance.