I have a really big (1.5M x 16M) sparse csr scipy matrix A. What i need to compute is the similarity of each pair of rows. I have defined the similarity as this:
Assume a and b are two rows of matrix A
a = (0, 1, 0, 4)
b = (1, 0, 2, 3)
Similarity (a, b) = 0*1 + 1*0 + 0*2 + 4*3 = 12
To compute all pairwise row similarities I use this (or Cosine similarity):
AT = np.transpose(A)
pairs = A.dot(AT)
Now pairs[i, j] is the similarity of row i and row j for all such i and j. This is quite similar to pairwise Cosine similarity of rows. So If there is an efficient parallel algorithm that computes pairwise Cosine similarity it would work for me as well.
The problem: This dot product is very slow because it uses just one cpu (I have access to 64 of those cpus on my server).
I can also export A and AT to a file and run any other external program that does the multiplication in parallel and get the results back to the Python program.
Is there any more efficient way of doing this dot product? or computing the pairwise similarity in Parallel?