I'm trying to estimate the cosine similarity between each document i
in a Corpus A
and all documents in a Corpus B
.
Any idea how I can do this efficiently? I'm working with pretty large datasets.
Essentially, I want to get the document(s) in Corpus B
which is (are) most similar for each document within Corpus A
.