i want to get the cosine similarity between sentences. I have tested doc2vec with gensim and trained it with only few sentences given in the code. But I want to train my model using a text document that have one sentence per each line. How can I use a document with sentences?
1
votes
1 Answers
0
votes
If your document is already in the form of a text file, with one-sentence-per-line, then many of the examples included with gensim (or elsewhere) show how to handle such a corpus.
For example, there's an introductory Doc2Vec tutorial notebook bundled with gensim in its docs/notebooks directory, which you can also view online at the project github repository:
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb
Its cell (3) shows, and cell (4) uses, a function to read a file line-by-line, and turn it into the TaggedDocument texts that the model requires.