I would like to load pretrained multilingual word embeddings from the fasttext library with gensim; here the link to the embeddings:
https://fasttext.cc/docs/en/crawl-vectors.html
In particular, I would like to load the following word embeddings:
- cc.de.300.vec (4.4 GB)
- cc.de.300.bin (7 GB)
Gensim offers the following two options for loading fasttext files:
gensim.models.fasttext.load_facebook_model(path, encoding='utf-8')
- Load the input-hidden weight matrix from Facebook’s native fasttext .bin output file.
- load_facebook_model() loads the full model, not just word embeddings, and enables you to continue model training.
gensim.models.fasttext.load_facebook_vectors(path, encoding='utf-8')
- Load word embeddings from a model saved in Facebook’s native fasttext .bin format.
- load_facebook_vectors() loads the word embeddings only. Its faster, but does not enable you to continue training.
Source Gensim documentation: https://radimrehurek.com/gensim/models/fasttext.html#gensim.models.fasttext.load_facebook_model
Since my laptop has only 8 GB RAM, I am continuing to get MemoryErrors or the loading takes a very long time (up to several minutes).
Is there an option to load these large models from disk more memory efficient?