I understand conceptually how word2vec and doc2vec work, but am struggling with the nuts and bolts of how the numbers in the vectors get processed algorithmically.
If the vectors for three context words are: [1000], [0100], [0010]
and the vector for the target word is [0001], does the algorithm perform one backward pass for each input/target output pair, like this:
[1000]-->[0001]
[0100]-->[0001]
[0010]-->[0001]
or are the input (context) vectors added together, like this:
[1110]-->[0001]
or is some other process used?
Additionally, do the document vectors used in doc2vec take the one-hot form of the word vectors, or are documents tagged with individual numbers on a continuous scale, like 1, 2, 3, etc.?
I get that the document tags are included as input nodes during the training process, but how are they used in the test phase? When entering the context word vectors to try to predict the target word (or vice versa) during testing, shouldn't an input for some document ID be required as well?