0
votes

I trained a FastText model in Gensim. I want to use it to encode my sentences. Specifically, I want to use this feature from native FastText:

./fasttext print-word-vectors model.bin < queries.txt

How to I save the model in Gensim so that it is the correct binary format that can be understood by native FastText?

I am using FastText 0.1.0 and Gensim 3.4.0 under Python 3.4.3.

In essence, I need the inverse of the load_binary_data() as given in the Gensim FastText doc.

1

1 Answers

0
votes

You probably wont find such a functionality in gensim as that would mean dependence on the internal structure and code like what you see in fasttext-python (which uses pybind to directly call the internal fasttext api). To have such a huge dependency on an external library is something which the creators of gensim would like to avoid and that is why they probably deprecated the functionality to call the fasttext wrapper. RIght now gensim only seeks to provide fasttext algorithm through its own internal implementation. I would suggest you use the python bindings for fasttext.

$ git clone https://github.com/facebookresearch/fastText.git
$ cd fastText
$ pip install .

Now run the training set in your python application with the fasttext model.

from fastText import train_unsupervised
model = train_unsupervised(input="pathtotextfile", model='skipgram')
model.save_model('model.bin')

This would save the model in the fastText command line format. So you should now be able to run the following.

$ ./fasttext print-word-vectors model.bin < queries.txt