What are the specifc steps for computing sentence vectors from word2vec word vectors using the averaging method?

Question

Beginner question, but I am a bit puzzled by this. Hope the answer to this question can benefit other beginners in NLP as well.

Here are some more details:

I know that you can compute sentence vectors from word vectors generated by word2vec. But what are the actual steps involved to make these sentence vectors. Can anyone provide a intuitive example and then some calculations to explain this process?

eg: Suppose I have a sentence with three words: Today is hot. And suppose these words have hypothetical vector values of: (1,2,3)(4,5,6)(7,8,9). Do I get the sentence vector by performing component-wise averaging of these word vectors? And what if the vectors are of different length eg: (1,2)(4,5,6)(7,8,9,23,76) what does the averaging process look like for these cases?

gojomo gojomo · Accepted Answer · 2017-08-12T21:20:58

Creating the vector for a length-of-text (sentence/paragraph/document) by averaging the word-vectors is one simple approach. (It's not great at capturing shades-of-meaning, but it's easy to do.)

Using the gensim library, it can be as simple as:

import numpy as np
from gensim.models.keyedvectors import KeyedVectors

wv = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin.gz', binary=True)
text = "the quick brown fox jumped over the lazy dog"
text_vector = np.mean([wv[word] for word in text.split()], axis=0)

Whether to use the raw word-vectors, or word-vectors that are either unit-normalized or otherwise weighted by some measure of word significance are alternatives to consider.

Word-vectors that are compatible with each other will have the same number of dimensions, so there's never an issue of trying to average differently-sized vectors.

Other techniques like 'Paragraph Vectors' (Doc2Vec in gensim) might give better text-vectors for some purposes, on some corpuses.

Other techniques for comparing the similarity of texts that leverage word-vectors, like "Word Mover's Distance" (WMD), might give better pairwise text-similarity scores than comparing single summary vectors. (WMD doesn't reduce a text to a single vector, and can be expensive to calculate.)

What are the specifc steps for computing sentence vectors from word2vec word vectors using the averaging method?

2 Answers