load the GoogleNews-vectors-negative300.bin and predict_output_word

Question

I tried to load the GoogleNews-vectors-negative300.bin and try the predict_output_word method,

I tested three ways, but every failed, the code and error of each way are shown below.

import gensim
from gensim.models import Word2Vec

The first:

I first used this line:

model=Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin',binary=True)


print(model.wv.predict_output_word(['king','man'],topn=10))

error:

DeprecationWarning: Deprecated. Use gensim.models.KeyedVectors.load_word2vec_format instead.

The second:

Then I tried:

model = gensim.models.KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin',binary=True)


print(model.wv.predict_output_word(['king','man'],topn=10))

error:

AttributeError: 'Word2VecKeyedVectors' object has no attribute 'predict_output_word'

The third: model = gensim.models.Word2Vec.load('GoogleNews-vectors-negative300.bin') print(model.wv.predict_output_word(['king','man'],topn=10))

error: _pickle.UnpicklingError: invalid load key, '3'.

I read the document at

https://radimrehurek.com/gensim/models/word2vec.html

but still have no idea the namespace where the predict_output_word would be in.

Anybody can help?

Thanks.

gojomo gojomo · Accepted Answer · 2018-06-13T17:11:22

The GoogleNews set of vectors is just the raw vectors – without a full trained model (including internal weights). So it:

can't be loaded as a fully-functional gensim Word2Vec model
can be loaded as a lookup-only KeyedVectors, but that object alone doesn't have the data or protocols necessary for further model training or other functionality

Google hasn't released the full model that was used to create the GoogleNews vector set.

Note also that the predict_output_word() function in gensim should be considered an experimental curiosity. It doesn't work in hierarchical-softmax models (because it's not as simple to generate ranked predictions). It doesn't quite match the same context-window weighting as is used during training.

Predicting words isn't really the point of the word2vec algorithm – and many imeplementations don't offer any interface for making individual word-predictions outside of the sparse bulk training process. Rather, word2vec uses the exercise of (sloppily) trying to make predictions to train word-vectors that turn out to be useful for other, non-word-prediction, purposes.

load the GoogleNews-vectors-negative300.bin and predict_output_word

1 Answers