Having trouble loading custom trained word vectors created in Gensim, into Spacy

Question

I've trained a model:

from gensim.models import Word2Vec    

model = Word2Vec(master_sent_list,
                     min_count=5,   
                     size=300,      
                     workers=5,    
                     window=5,      
                     iter=30)

Saved it according to this post:

model.wv.save_word2vec_format("../moj_word2vec.txt")
!gzip ../moj_word2vec.txt
!python -m spacy init-model en ../moj_word2vec.model --vectors-loc ../moj_word2vec.txt.gz

Everything looks fine:

✔ Successfully created model
22470it [00:02, 8397.55it/s]j_word2vec.txt.gz
✔ Loaded vectors from ../moj_word2vec.txt.gz
✔ Sucessfully compiled vocab
22835 entries, 22470 vectors

I then load the model under a different name:

nlp = spacy.load('../moj_word2vec.model/')

Something goes wrong however, because I can't use common commands on nlp; that I can on model.

For example, these work:

model.wv.most_similar('police')
model.vector_size

But these don't:

nlp.wv.most_similar('police')
AttributeError: 'English' object has no attribute 'wv'

nlp.most_similar('police')
AttributeError: 'English' object has no attribute 'most_similar'

nlp.vector_size
AttributeError: 'English' object has no attribute 'vector_size'

So something seems to have broken in the loading, or perhaps the saving, could someone help please?

gojomo gojomo · Accepted Answer · 2020-03-27T00:23:15

Nothing's broken - you just have the wrong expectations.

The models from spacy, as loaded into your nlp variable, won't support methods from gensim model classes.

It's a different library, code, classes, and API – which does not itself make use of gensim code under-the-hood – even if it can import the plain set-of-vectors from the plain word2vec_format.

(Compare, for example, the results of type(model) or type(model.wv) on your working gensim model, then type(nlp) of the spacy object that's created later: totally different types, with different methods/properties.)

You'll have to use some combination of:

checking the spacy docs for equivalent operations
if you need the gensim operations, load the vectors into a gensim model class. For example:

from gensim.models.keyedvectors import KeyedVectors
wv = KeyedVectors.load_word2vec_format(filename)
# then do gensim ops on the `wv` object

(You could also save the entire gensim Word2Vec model, using the .save() method, which will store it in one or more files using Python pickling. It could then be reloaded into a gensim Word2Vec model using Word2Vec.load() – though if you're only needing to look at individual word-vector by word-key, you don't need the full model.)

Having trouble loading custom trained word vectors created in Gensim, into Spacy

1 Answers