I am trying to use gensim's doc2vec to create a model which will be trained on a set of documents and a set of labels. The labels were created manually and need to be put into the program to be trained on. So far I have 2 lists: a list of sentences, and a list of labels corresponding to that sentence. I need to use doc2vec specifically. Here is what I have tried so far.
from gensim import utils
from gensim.models import Doc2Vec
tweets = ["A tweet", "Another tweet", "A third tweet", ... , "A thousandth-something tweet"]
labels_list = [1, 1, 3, ... , 16]
tagged_data = [tweets, labels_list]
model = Doc2Vec(size=20, alpha=0.025, min_alpha=0.00025, min_count=1, dm=1)
model.build_vocab(tagged_data)
for epoch in range(max_epochs):
model.train(tagged_data, total_examples=model.corpus_count,
epochs=model.iter)
model.alpha -= 0.0002
model.min_alpha = model.alpha
I am getting the error on the line with model.build_vocab(tagged_data) that there is an AttributeError: 'list' object has no attribute 'words'. I googled this and it says to put it into a labeled sentence object, but I am not sure if that will work if I have predefined labels. So does anyone know how to put pre-defined labels into doc2vec? Thanks in advance.