14
votes

It looks like scipy.spatial.distance.cdist cosine similariy distance:

link to cos distance 1

1 - u*v/(||u||||v||)

is different from sklearn.metrics.pairwise.cosine_similarity which is

link to cos similarity 2

 u*v/||u||||v||

Does anybody know reason for different definitions?

1
The link that you labeled "link to cos similarity 1" is not cosine similarity, and it is not called that in the link. It is cosine distance. - Warren Weckesser
Think of the trivial case: distance(X, X) should be 0, because the distance from X to X is 0. similarity(X, X) should be the maximum of the function that measures similariy (1 in this case), because X and X are as similar as two things can be. - Warren Weckesser
@WarrenWeckesser, thank you, I fixed the name. - user1700890

1 Answers

23
votes

Good question but yes, these are 2 different things but connected by the following equation:

Cosine_distance = 1 - cosine_similarity


Why?

Usually, people use the cosine similarity as a similarity metric between vectors. Now, the distance can be defined as 1-cos_similarity.

The intuition behind this is that if 2 vectors are perfectly the same then similarity is 1 (angle=0) and thus, distance is 0 (1-1=0).

Similarly you can define the cosine distance for the resulting similarity value range.

Cosine similarity range: −1 meaning exactly opposite, 1 meaning exactly the same, 0 indicating orthogonality.


References: Scipy wolfram

From scipy