0
votes

The database and the classification rule, how to calculate precision and recall?

MinSupp=3% và MinConf=30%

No. outlook temperature humidity    windy   play
1   sunny       hot     high        FALSE   no
2   sunny       hot     high        TRUE    no
3   overcast    hot     high        FALSE   yes
4   rainy       mild    high        FALSE   yes
5   rainy       cool    normal      FALSE   yes
6   rainy       cool    normal      TRUE    no
7   overcast    cool    normal      TRUE    yes
8   sunny       mild    high        FALSE   no
9   sunny       cool    normal      FALSE   yes
10  rainy       mild    normal      FALSE   yes
11  sunny       mild    normal      TRUE    yes
12  overcast    mild    high        TRUE    yes
13  overcast    hot     normal      FALSE   yes
14  rainy       mild    high        TRUE    no

Rule found:

1: (outlook,overcast) -> (play,yes) [Support=0.29 , Confidence=1.00 , Correctly Classify= 3, 7, 12, 13]

2: (humidity,normal), (windy,FALSE) -> (play,yes) [Support=0.29 , Confidence=1.00 , Correctly Classify= 5, 9, 10]

3: (outlook,sunny), (humidity,high) -> (play,no) [Support=0.21 , Confidence=1.00 , Correctly Classify= 1, 2, 8]

4: (outlook,rainy), (windy,FALSE) -> (play,yes) [Support=0.21 , Confidence=1.00 , Correctly Classify= 4]

5: (outlook,sunny), (humidity,normal) -> (play,yes) [Support=0.14 , Confidence=1.00 , Correctly Classify= 11]

6: (outlook,rainy), (windy,TRUE) -> (play,no) [Support=0.14 , Confidence=1.00 , Correctly Classify= 6, 14]

Thanks.

1
You can't compute precision and recall as you don't have (or didn't provide) the ground truth values for each entry. Without these it's impossible to find the false positives, false negatives and true positives required for the precision and recall calculation.Omri374

1 Answers

0
votes

I think that all you need to know about precision and recall can be found here.

In plain English, the precision is how many actually correct results your system retrieved / how many results your system pointed as correct. In the same way, the recall would be how many actually correct results your system retrieved / the total number of actually correct results available in your dataset.