I am trying out Google's word2vec pre-trained model to get word embeddings. I am able to load the model in my code and I can see that I get a 300-dimensional representation of a word. Here is the code -
import gensim
from gensim import models
from gensim.models import Word2Vec
model = gensim.models.KeyedVectors.load_word2vec_format('/Downloads/GoogleNews-vectors-negative300.bin', binary=True)
dog = model['dog']
print(dog.shape)
which gives me below output -
>>> print(dog.shape)
(300,)
This works but I am interested in obtaining a vector representation for entire document and not just one word. How can I do it using word2vec model ?
dog_sentence = model['it is a cute little dog']
KeyError: "word 'it is a cute little dog' not in vocabulary"
I plan to apply these on many documents and then train a clustering model on topic of it to do unsupervised learning and topic modeling.