4
votes

I'm training my own word2vec model using different data. To implement the resulting model into my classifier and compare the results with the original pre-trained Word2vec model I need to save the model in binary extension .bin. Here is my code, sentences is a list of short messages.

import gensim, logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
sentences = gensim.models.word2vec.LineSentence('dati.txt')
model = gensim.models.Word2Vec(
sentences, size=300, window=5, min_count=5, workers=5,
sg=1, hs=1, negative=0
)
model.save_word2vec_format('model.bin', binary=True)

The last method, save_word2vec_format, gives me this error:

AttributeError: 'Word2Vec' object has no attribute 'save_word2vec_format'

What am I missing here? I've read the documentation of gensim and other forums. This repo on github uses almost the same configuration so I cannot understand what's wrong. I've tried to switch from skipgram to cbow and from hierarchical softmax to negative sampling with no results.

Thank you in advance!

2

2 Answers

5
votes

Are you using a pre-release release candidate version of gensim, or code directly from the develop branch?

In those versions save_word2vec_format() has moved to a utility class called KeyedVectors.

You won't yet (as of February 2017) get these versions from the usual way of installing gensim, pip install gensim – and it's likely that by the time this change is in the official distribution, the error message for trying the older call will be improved.

I recommend using the version that comes via plain pip install gensim unless you are a relatively expert user who is also carefully following the project CHANGELOG.md.

5
votes
from gensim.models import Word2Vec, KeyedVectors   
model.wv.save_word2vec_format('model.bin', binary=True)