
I have a skewed dataset (5,000,000 positive examples and only 8000 negative [binary classified]) and thus, I know, accuracy is not a useful model evaluation metric. I know how to calculate precision and recall mathematically but I am unsure how to implement them in python code.

When I train the model on all the data I get 99% accuracy overall but 0% accuracy on the negative examples (ie. classifying everything as positive).

I have built my current model in Pytorch with the criterion = nn.CrossEntropyLoss() and optimiser = optim.Adam().

So, my question is, how do I implement precision and recall into my training to produce the best model possible?

Thanks in advance


2 Answers


The implementation of precision, recall and F1 score and other metrics are usually imported from the scikit-learn library in python.

link: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics

Regarding your classification task, the number of positive training samples simply eclipse the negative samples. Try training with reduced number of positive samples or generating more negative samples. I am not sure deep neural networks could provide you with an optimal result considering the class skewness.

Negative samples can be generated using the Synthetic Minority Over-sampling Technique (SMOT) technique. This link is a good place to start. Link: https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem/

Try using simple models such as logistic regression or random forest first and check if there is any improvement in the F1 score of the model.


To add to the other answer, some classifiers have a parameter called class_weight which let's you modify the loss function. By penalizing wrong predictions on the minority class more, you can train your classifier to learn to predict both classes. For a pytorch specific answer, you can refer this link

As mentioned in the other answer, over and undersampling strategies can be used. If you are looking for something better, take a look at this paper