Doc2vec Gensim: the word embeddings not updating during each epoch

Question

I use Gensim Doc2vec model to train document vectors. I printed out representations for the word 'good', but I found every epoch, I found not updating! While I printed out representations for the document with id '3', every epoch different!

My codes are below, do not know what is happening.

model = gensim.models.Doc2Vec(dm = 0, alpha=0.1, size= 20, min_alpha=0.025)

model.build_vocab(documents)

print ('Building model....',(time4-time3))
for epoch in range(10):
    model.train(documents)

    print('Now training epoch %s' % epoch)
    print(model['good'])
    print(model.docvecs[str(3)])

gojomo gojomo · Accepted Answer · 2016-09-27T04:30:48

The pure PV-DBOW model (dm=0) doesn't involve use or training of word-vectors at all. (It's just an artifact of the shared-code with Word2Vec that they're allocated and randomly-initialized at all.)

If you want word-vectors to be trained in an interleaved fashion, you must use the non-default dbow_words=1 parameter. (Or, switch to PV-DM mode, dm=1, where word-vectors are inherently involved.)

Doc2vec Gensim: the word embeddings not updating during each epoch

3 Answers