I've calculated the TF of my dataset and I'm currently trying to calculate the IDF for it. I'm confused to which number to use for the calculation.
id uid
1 a
1 b
1 c
1 d
2 a
2 b
2 c
2 e
3 b
3 c
3 e
3 f
(3 items)
Occurrence
a = 2
b = 3
c = 3
d = 1
e = 2
f = 1
Which gives something like this below:
A B C
A - 2 2
B 2 - 3
C 2 3 -
Formula
IDF(t,D)=log(Total Number documents/Number of Document matching term);
For example using (A,B) which value is 2: how should I go about calculating it?
Total items = 3
Number of document matching terms = should i be using A or B value? (2 or 3)
(A,B) * log(total / matching)
= 2 * log ( 3 / 2 or 3) ?