0
votes

I am confused as to how I can use Doc2Vec(using Gensim) for IMDB sentiment classification dataset. I have got the Doc2Vec embeddings after training on my corpus and built my Logistic Regression model using it. How do I use it to make predictions for new reviews? sklearn TF-IDF has a transform method that can be used on test data after training on training data, what is its equivalent in Gensim Doc2Vec?

2

2 Answers

1
votes

To get a vector for an unseen document, use vector = model.infer_vector(["new", "document"]) Then feed vectorinto your classifier: preds = clf.predict([vector]).

0
votes

Have you seen the demo notebook, included with the gensim source code through gensim-3.8.1, which applies Doc2Vec to the IMDB dataset?

https://github.com/RaRe-Technologies/gensim/blob/3.8.1/docs/notebooks/doc2vec-IMDB.ipynb