This is my first time using Doc2Vec I'm trying to classify works of an author. I have trained a model with Labeled Sentences (paragraphs, or strings of specified length), with words = the list of words in the paragraph, and tags = author's name. In my case I only have two authors. I tried accessing the docvecs attribute from the trained model but it only contains two elements, corresponding to the two tags I have when I trained the model. I'm trying to get the doc2vec numpy representations of each paragraph I fed in to the training so I can use that as training data later on. How can I do this? Thanks.
0
votes
1 Answers
0
votes
Bulk training only creates vectors for tags you supplied. If you want to read out a bulk-trained vector per paragraph (as if by model.docvecs['paragraph000']
), you have to give each paragraph a unique tag during training (like 'paragraph000'
). You can give docs other tags as well - but bulk training only creates remembers doc-vectors for supplied tags.
After training, you can infer vectors for any other texts you supply to infer_vector()
- and of course you could supply the same paragraphs that were used during training.