I'm into a new project which I desire to represent words as vectors, I read about Fasttext library and I saw that they have pre-trained models for language which is not English. The purpose is to predict closeness between different words
what I want to know is can I train a Fasttext model on non-English data and like articles of news sites, to achieve better results for specific genres like politics and nowadays topics, and so.
- Can I train it on non-English data sets?
- How long does it take to train a model for 10 GB of text? is it big enough?
- There are any better solutions?
Thanks in advance!