Getting numpy vector from a trained Doc2Vec model for each document

Question

This is my first time using Doc2Vec I'm trying to classify works of an author. I have trained a model with Labeled Sentences (paragraphs, or strings of specified length), with words = the list of words in the paragraph, and tags = author's name. In my case I only have two authors. I tried accessing the docvecs attribute from the trained model but it only contains two elements, corresponding to the two tags I have when I trained the model. I'm trying to get the doc2vec numpy representations of each paragraph I fed in to the training so I can use that as training data later on. How can I do this? Thanks.

gojomo gojomo · Accepted Answer · 2017-11-08T05:37:06

Bulk training only creates vectors for tags you supplied. If you want to read out a bulk-trained vector per paragraph (as if by model.docvecs['paragraph000']), you have to give each paragraph a unique tag during training (like 'paragraph000'). You can give docs other tags as well - but bulk training only creates remembers doc-vectors for supplied tags.

After training, you can infer vectors for any other texts you supply to infer_vector() - and of course you could supply the same paragraphs that were used during training.

Getting numpy vector from a trained Doc2Vec model for each document

1 Answers