3
votes

As I understand it, Word2Vec builds a word dictionary (or, vocabulary) based on a training corpus, and outputs a K-dim vector for each word in the dictionary. My question is, what exactly is the source of those K-Dim vectors? I'm assuming each vector is either a row or column in one of the weight matrices between the input and hidden layer, or the hidden and output layer. However, I haven't been able to find any sources to back this up, and I'm not literate enough in programming languages examine the source code and figure it out myself. Any clarifying remarks on this topic would be greatly appreciated!

2

2 Answers

1
votes

what exactly is the source of those K-Dim vectors? I'm assuming each vector is either a row or column in one of the weight matrices between the input and hidden layer, or the hidden and output layer.

In the word2vec model(CBOW, Skip-gram), it outputs a feature matrix of words. This matrix is first weight matrix between input layer and projection layer(in word2vec model has no hidden layer, no activation function in it). Because when we train word in the context(in CBOW Model), we updates this weight matrix.(second - between projection and output layer - matrix also updated. however we are not using it)

in the first matrix, rows mean a vocabulary words and columns mean feature of word(K-Dimension).

if you want more information, explore it

http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

1
votes

word2vec uses machine learning to obtain word representations. It predicts a word using its context (CBOW) or vice versa (skip-gram).

In machine learning, you have a loss function that represents the error you model makes. This error depends on the model's parameters. Training a model means minimizing the error with respect to the model's parameters.

In word2vec, these embedding matrices are the model's parameters that are being updated during the training. I hope, it helps you to understand where they come from. Indeed, they are first initialized randomly and they are changed during the training process.

You can take a look at this picture from this paper: simple word2vec

The W matrix that maps the input one-hot word representations to that k-dimensional vectors and the W' matrix that maps a k-dimensional representation to the output are both the model's parameters that we optimize during training.