hi i'm trying to calculate the cosine similarity between my query and the documents i return with my information retrieval program in python.
for the cosine similarity i use this implementation:
import math
def cosine_similarity(v1,v2):
sumxx, sumxy, sumyy = 0, 0, 0
for i in range(len(v1)):
x = v1[i]; y = v2[i]
sumxx += x*x
sumyy += y*y
sumxy += x*y
return sumxy/math.sqrt(sumxx*sumyy)
I found this solution on this website, but i'm having some problems. I tf*idf weights and the vector of each document, this is an example of a document vector and a query vector:
D: [0.028239449664633154, 0.05559373180364792, 0.02798439181455718]
Q: [0.3746433655507998, 0.526816791853616, 0.618765996788542]
Ok, so the problem is that sometimes whet i do the cosine similarity, the result is bigger than 1, how is this possible? Cosine can't be bigger than 1? Is my reasoning correct? Is it correct doing the cosine similarity in this case? Please help me, thanks