
I have a question about fasttext (https://fasttext.cc/). I want to download a pre-trained model and use it to retrieve the word vectors from text.

After downloading the pre-trained model (https://fasttext.cc/docs/en/english-vectors.html) I unzipped it and got a .vec file. How do I import this into fasttext?

I've tried to use the mentioned function as follows:

import fasttext
import io

def load_vectors(fname):
    fin = io.open(fname, 'r', encoding='utf-8', newline='\n', errors='ignore')
    n, d = map(int, fin.readline().split())
    data = {}
    for line in fin:
        tokens = line.rstrip().split(' ')
        data[tokens[0]] = map(float, tokens[1:])
    return data

vectors = load_vectors('/Users/username/Downloads/wiki-news-300d-1M.vec')
model = fasttext.load_model(vectors)

However, I can't completely run this code because python crashes. How can I successfully load these pre-trained word vectors?

Thank you for your help.

Pleas edit your question to specify whether there is an error message.ygorg
How big is the vector file? How much RAM does your machine have?dennlinger

1 Answers


FastText's advantage over word2vec or glove for example is that they use subword information to return vectors for OOV (out-of-vocabulary) words.

So they offer two types of pretrained models : .vec and .bin.

.vec is a dictionary Dict[word, vector], the word vectors are pre-computed for the words in the training vocabulary.

.bin is a binary fasttext model that can be loaded using fasttext.load_model('file.bin') and that can provide word vector for unseen words (OOV), be trained more, etc.

In your case you are loading a .vec file, so vectors is the "final form" of the data. fasttext.load_model expects a .bin file.

If you need more than a python dictionary you can use gensim.models.keyedvector (which handles any word vectors, such as word2vec, glove, etc...).