0
votes

I have some labeled data which classifies datasets as positive or negative. Now i have an algorithm that does the same automatically and I want to compare the results.

I was said to use precision and recall, but I'm not sure whether those are appropriate because the true negatives don't even appear in the formulas. I'd rather tend to use a general "prediction rate" for both, positives and negatives.

How would be a good way to evaluate the algorithm? Thanks!!

1
Can you please post your code?Giuseppe Garassino
the results look like this: <pre> data + user + algorithm ----------|------|---------- some text | pos | pos other txt | neg | pos whatever | neg | neg littlepny | pos | neg stackover | neg | pos</pre>classification_guy
sry for the format... i'm new here... the results look like this: {[some text, pos, pos]; [other txt, neg, pos]; [whatever, neg, neg]; [littlepny, pos, neg]} ...so its like some data, then the manual annotation, then the program's output. ...and i'm just not sure if i should use precision/recall to show how good it works or some other "numbers"... ;-)classification_guy

1 Answers

0
votes

There is no general "best" method of evaluation, everything depends on what is your aim, as each method captures different phenomena:

  • Accuracy is the simple measure, well suited for multi-label classification and rather well balanced data
  • F1-score captures precision/recall tradeoff
  • MCC is a good measure which is well suited for dataset with large dissproportion in the class sizes