What evaluation classifiers? Precision & recall?

Question

I have some labeled data which classifies datasets as positive or negative. Now i have an algorithm that does the same automatically and I want to compare the results.

I was said to use precision and recall, but I'm not sure whether those are appropriate because the true negatives don't even appear in the formulas. I'd rather tend to use a general "prediction rate" for both, positives and negatives.

How would be a good way to evaluate the algorithm? Thanks!!

the results look like this: <pre> data + user + algorithm ----------|------|---------- some text | pos | pos other txt | neg | pos whatever | neg | neg littlepny | pos | neg stackover | neg | pos</pre> — classification_guy
sry for the format... i'm new here... the results look like this: {[some text, pos, pos]; [other txt, neg, pos]; [whatever, neg, neg]; [littlepny, pos, neg]} ...so its like some data, then the manual annotation, then the program's output. ...and i'm just not sure if i should use precision/recall to show how good it works or some other "numbers"... ;-) — classification_guy

lejlot lejlot · Accepted Answer · 2013-09-28T19:21:01

There is no general "best" method of evaluation, everything depends on what is your aim, as each method captures different phenomena:

Accuracy is the simple measure, well suited for multi-label classification and rather well balanced data
F1-score captures precision/recall tradeoff
MCC is a good measure which is well suited for dataset with large dissproportion in the class sizes

What evaluation classifiers? Precision & recall?

1 Answers