Word2Vec and vector origin

Question

I read the two papers by Mikolov et al on Word2Vec (see here and here).

I understand the concept of word vectors and how they represent meaning. However, I don't understand where the final word vector comes from when training a neural network. The inputs are one-hot encodings of words, which try to predict a one-hot encoding of another word. So how do you get the final n-dimensional word vectors?

vega vega · Accepted Answer · 2016-02-16T02:53:11

After weeks of searching, I've found the answer. This could be useful to anyone interested in understand Word2Vec (and word embedding in general) as opposed to just using it.

When training the neural network, the input is a one-hot vector and the output can be a concatenation (or average) of one-hot vectors, which is your context. In the middle is the hidden layer. The hidden layer has d units. So the input to hidden is a |V| x d matrix. Each row in that matrix is the word embedding corresponding to the non-zero unit in your one-hot vector

For example, if a word encoded in one-hot vector is [0, 0, 1, 0], it will be input into your neural network transposed. Notice only one unit is non-zero, so only that input unit fires to all hidden units. So the 3rd row in your matrix is the only one we care about, hence your word embedding is a row in the matrix.

I hope that helps anyone interested (maybe I'm the only one?)

Word2Vec and vector origin

1 Answers