How are word vectors co-trained with paragraph vectors in doc2vec DBOW?

Question

I don't understand how word vectors are involved at all in the training process with gensim's doc2vec in DBOW mode (dm=0). I know that it's disabled by default with dbow_words=0. But what happens when we set dbow_words to 1?

In my understanding of DBOW, the context words are predicted directly from the paragraph vectors. So the only parameters of the model are the N p-dimensional paragraph vectors plus the parameters of the classifier.

But multiple sources hint that it is possible in DBOW mode to co-train word and doc vectors. For instance:

section 5 of An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation
this SO answer: How to use Gensim doc2vec with pre-trained word vectors?

So, how is this done? Any clarification would be much appreciated!

Note: for DM, the paragraph vectors are averaged/concatenated with the word vectors to predict the target words. In that case, it's clear that words vectors are trained simultaneously with document vectors. And there are N*p + M*q + classifier parameters (where M is vocab size and q word vector space dim).

gojomo gojomo · Accepted Answer · 2019-04-09T16:28:31

If you set dbow_words=1, then skip-gram word-vector training is added the to training loop, interleaved with the normal PV-DBOW training.

So, for a given target word in a text, 1st the candidate doc-vector is used (alone) to try to predict that word, with backpropagation adjustments then occurring to the model & doc-vector. Then, a bunch of the surrounding words are each used, one at a time in skip-gram fashion, to try to predict that same target word – with the followup adjustments made.

Then, the next target word in the text gets the same PV-DBOW plus skip-gram treatment, and so on, and so on.

As some logical consequences of this:

training takes longer than plain PV-DBOW - by about a factor equal to the window parameter
word-vectors overall wind up getting more total training attention than doc-vectors, again by a factor equal to the window parameter

How are word vectors co-trained with paragraph vectors in doc2vec DBOW?

1 Answers