I have several thousand documents that I'd like to use in a gensim doc2vec model, but I only have 5grams for each of the documents, not the full texts in their original word order. In the doc2vec tutorial on the gensim website (https://radimrehurek.com/gensim/auto_examples/tutorials/run_doc2vec_lee.html), a corpus is created with full texts and then the model is trained on that corpus. It looks something like this:
[TaggedDocument(words=['hundreds', 'of', 'people', 'have', 'been', 'forced', 'to', 'vacate', 'their', 'homes', 'in', 'the', 'southern',...], tags=[1]), TaggedDocument(words=[.....], tags=[2]),...]
Is it possible create a training corpus where each document consists of a list of 5grams rather than a list of words in their original order?