I have a set of documents in which I am searching for my keyword. I have calculated the tf-idf values for the keyword and all the documents. Suppose, I am storing my tf-idf value in an array for all the documents, how do I use it to calculate my cosine similarity? Any kind of help with the code appreciated!
0
votes
1 Answers
1
votes
You can view the array as a collection of vectors, one for each document with a number of elements equal to the number of terms. To determine the similarity of two documents, you calculate the scalar product of the corresponding vectors in the usual manner (sum of the products of the corresponding vector components) and divide it by the product of the norms of the two vectors.
It is practical to normalize the vectors before calculating the similarities. In this case, you just use the scalar product of the document vectors, as the norms will be one.