4
votes

I was reading the paper "Improving Distributional Similarity with Lessons Learned from Word Embeddings" by Levy et al., and while discussing their hyperparameters, they say:

Vector Normalization (nrm) As mentioned in Section 2, all vectors (i.e. W’s rows) are normalized to unit length (L2 normalization), rendering the dot product operation equivalent to cosine similarity.

I then recalled that the default for the sim2 vector similarity function in the R text2vec package is to L2-norm vectors first:

sim2(x, y = NULL, method = c("cosine", "jaccard"), norm = c("l2", "none"))

So I'm wondering, what might be the motivation for this, normalizing and cosine (both in terms of text2vec and in general). I tried to read up on the L2 norm, but mostly it comes up in the context of normalizing before using the Euclidean distance. I could not find (surprisingly) anything on whether L2-norm would be recommended for or against in the case of cosine similarity on word vector spaces/embeddings. And I don't quite have the math skills to work out the analytic differences.

So here is a question, meant in the context of word vector spaces learned from textual data (either just co-occurrence matrices possible weighted by tfidf, ppmi, etc; or embeddings like GloVe), and calculating word similarity (with the goal being of course to use a vector space+metric that best reflects the real-world word similarities).
Is there, in simple words, any reason to (not) use L2 norm on a word-feature matrix/term-co-occurrence matrix before calculating cosine similarity between the vectors/words?

2

2 Answers

2
votes

text2vec handles everything automatically - it will make rows have unit L2 norm and then call dot product to calculate cosine similarity.

But if matrix already has rows with unit L2 norm then user can specify norm = "none" and sim2 will skip first normalization step (saves some computation).

I understand confusion - probably I need to remove norm option (it doesn't take much time to normalize matrix).

2
votes

If you want to get cosine similarity you DON'T need to normalize to L2 norm and then calculate cosine similarity. Cosine similarity anyway normalizes the vector and then takes dot product of two.

If you are calculating Euclidean distance then u NEED to normalize if distance or vector length is not an important distinguishing factor. If vector length is a distinguishing factor then don't normalize and calculate Euclidean distance as it is.