1
votes

i want to get the cosine similarity between sentences. I have tested doc2vec with gensim and trained it with only few sentences given in the code. But I want to train my model using a text document that have one sentence per each line. How can I use a document with sentences?

1
Welcome to StackOverflow! Please update your question to show what you have already tried in a Minimal, Complete, and Verifiable example. For further information, please see How to Ask. - Raoslaw Szamszur

1 Answers

0
votes

If your document is already in the form of a text file, with one-sentence-per-line, then many of the examples included with gensim (or elsewhere) show how to handle such a corpus.

For example, there's an introductory Doc2Vec tutorial notebook bundled with gensim in its docs/notebooks directory, which you can also view online at the project github repository:

https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-lee.ipynb

Its cell (3) shows, and cell (4) uses, a function to read a file line-by-line, and turn it into the TaggedDocument texts that the model requires.