2
votes

I'm using skikit-learn for text classification. When I used the classification_report() function it returned the following table:

                precision  recall   f1-score   support

        neg       0.86      0.69      0.77       351
        pos       0.82      0.92      0.87       517

avg / total       0.83      0.83      0.83       868

What is the meaning of precision, recall and f1-score? What conclusions can be made from the above values? Also, do these values reflect anything about my classifier?

1
That question is not about programming. Note that you can and probably should find and read the corresponding wikipedia articles.cel

1 Answers

2
votes

Recall reflects how many examples of a given class are labeled as being of that class. Precision reflects how many examples that were labeled by your classifier as being of that class are really examples for that class.

Suppose you have your two classes neg and pos. If you now label all of your examples as being of class neg then your recall for neg will be great, at 1.00 or 100%, because whenever an example was of class neg you labeled it as neg. At the same time the recall for pos will be horrible, because not a single example of class pos was labeled as pos. Additionally your precision for neg will be bad, because a lot of examples that were labeled as neg were really pos.

Conversely you might give examples the label neg only if you are absolutely sure that they belong to class neg. Then most likely your recall for neg will be horrible, because you catch hardly any of the neg examples. However your precision will be great, because (nearly) all of the examples that were labeled as neg are really of class neg.

So: Labeling everything as being of class A will result in high recall for class A, but bad precision. Labeling nearly nothing as being of class A will usually end up in low recall, but high precision for class A.

The F1-Score that is also listed is simply a merge of recall and precision. If your F1-Score is high then usually both recall and precision tend to be good. If it is low then your recall and precision tend to be bad.

From your example values you can derive that your classifiers performance seems to be generally not too bad with an F1-Score of 0.83. The recall for neg is a bit low compared to the other values so your classifier has problems with spotting examples for neg and labels those as pos instead (which then lowers the precision for pos). If those are the results of your training and not test set then the differences in the support values indicate that you have more examples for pos than for neg, meaning that you would be training on a slightly skewed dataset. Balancing those numbers could also lead to a more balanced recall.

Further reading: