0
votes

How to calculate the (e.g., cosine) similarity between one sparse vector and a matrix (i.e., an array of sparse vectors)?
Is this possible using scikit-learn, scipy, numpy, etc.? If possible, the similarity metric should be easily changeable.

1

1 Answers

1
votes

If you are interested in calculating the cosine similarity, it can be done by using cosine-similarity metric functionality present in sklearn which returns the distance matrix if the input is in matrix form.

Illustration:

import numpy as np
from sklearn.metrics.pairwise import pairwise_distances

mat_1 = np.matrix([[1,2,3],[3,4,5]])
vec_1 = (2, 3, 5)
# Make sure the dimensions of the vector and matrix are equal
>>>print pairwise_distances(mat_1, vec_1, metric = 'cosine')
[[ 0.00282354]
[ 0.01351234]]

Note: If you intend on changing the distance metrics, you can do so by placing the appropriate names to the metric parameter. However, if your input contains sparse matrix, you can only use the metrics - ['cityblock', 'cosine', 'euclidean', 'l1', 'l2', 'manhattan'] as others aren't supported to handle the sparse metric inputs.


Docs you can refer further : Pairwise metrics, Affinities and Kernels