1
votes

How do I calculate the euclidean distance(similarity) between two documents eg D1 and D2 using relative frequency?.

Below is an example of both cosine and euclidean distance between two documents using absolute frequency.

D1 (frequencies) = 4,9,7,0,0,3. = {16+81+49+9} = sqrt (155) = 12.45

D2 (frequencies) = 4,5,0,7,5,0. = {16+25+49+25} = sqrt (115) = 10.72

Cosine D1,D2 = (4x4+9x5) / 12.45x10.72 = 0.4569 (absolute frequency & relative frequency) for cosine absolute frequency is the same as relative frequency

Also

Euclidean D1, D2 = sqrt( sqr(4-4) + sqr(9-5) + sqr(7) + sqr(7) + sqr(5) + sqr(3) ) =sqrt( 0+16+49+49+25+9) = sqrt( 148 ) = 12.17(absolute frequency).

The relative frequency for this is 0.2532.

i'm trying to get the relative frequency (euclidean) for this problem, i haven't found any tutorial that helps. all i could find only the answer 0.2532 without a formula or explanation.

1
I'm not quite sure what relative frequency means in this context. Its defined as the absolute frequency divided by the total number of events. Its not clear to me what you would divide by to get the relative frequencies.Salix alba

1 Answers

0
votes

read up on euclidean distance here to get a better understanding