3
votes

In word2vec, after training, we get two weight matrixes:1.input-hidden weight matrix; 2.hidden-output weight matrix. and people will use the input-hidden weight matrix as the word vectors(each row corresponds to a word, namely, the word vectors).Here comes to my confusions:

  1. why people use input-hidden weight matrix as the word vectors instead of the hidden-output weight matrix.
  2. why don't we just add softmax activation function to the hidden layers rather than output layers, thus preventing time-consuming.

Plus, clarifying remarks on the intuition of how word vectors can be obtained like this will be appreciated.

2

2 Answers

1
votes

Regarding the two, input-hidden weight matrix and hidden-output weight matrix, there is an interesting research paper. 'A Dual Embedding Space Model for Document Ranking', Mitra et al., arXiv 2016. (https://arxiv.org/pdf/1602.01137.pdf). Similar with your question, this paper studies how these two weight matrix are different, and claims that they encode different characteristics of words.

Overall, from my understanding, it is your choice to use either the input-hidden weight matrix (convention), hidden-output weight matrix, or the combined one as word embeddings, depending on your data and the problem to solve.

1
votes

For question 1:

This is because the input weight matrix is for the target word while the output weight matrix is for a context word. The vector we attempt to learn for a word is the vector of the word itself as the target word - as the intuition for word2vec is that words(as target word!) which occur in similar contexts learn similar vector representations.

The vector for a context word exists only for training's purpose. It's possible to use the same vector as target word, but learning the two separately is better. For example: if you use the same vector representations, the model would yield the highest probability for a word occurring in a context of itself (dot product of two same vectors), but it's obviously counterintuitive (how often do you use two identical words one after another?).