I have a dataset of some 20000 training examples, on which i want to do a binary classification. The problem is the dataset is heavily imbalanced with only around 1000 being in the positive class. I am trying to use xgboost (in R) for doing my prediction.
I have tried oversampling and undersampling and no matter what i do, somehow the predictions always result in classifiying everything as the majority class.
I tried reading this article on how to tune parameters in xgboost. https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/
But it only mentions which parameters help with imbalanced datasets, but not how to tune them.
I would appreciate if anyone has any advice on tuning the learning parameters of xgboost to handle imbalanced datasets and also on how to generate the validation set for such cases.