Fine-tuning Glove Embeddings

3

votes

Has anyone tried to fine-tune Glove embeddings on a domain-specific corpus?
Fine-tuning word2vec embeddings has proven very efficient for me in a various NLP tasks, but I am wondering whether generating a cooccurrence matrix on my domain-specific corpus, and training glove embeddings (initialized with pre-trained embeddings) on that corpus would generate similar improvements.

machine-learningnlpword2vecword-embedding

why do you want to fine-tune glove specifically? have you consider methods like this one for any kind of general word embedding arxiv.org/abs/1801.06146, nlp.fast.ai/category/classification.html - l.augustyniak

3

votes

I myself am trying to do the exact same thing. You can try mittens.

They have succesfully built a framework for it. Christopher D. Manning(co-author of GloVe) is associated with it.

1

votes

word2vec and Glove are a techniques for producing word embeddings, i.e., for modelling text (a set of sentences) into computer-readable vectors.

While word2vec trains on the local context (neighboring words), Glove will look for words co-occurrence in a whole text or corpus, its approach is more global.

word2vec

There are two main approaches for word2vec, in which the algorithm loops through the worlds of the sentence. For each current word w it will try to predict

the neighboring words from w and its context, this is the Skip-Gram approach
w from its context, this is the CBOW approach

Hence, word2vec will produce a similar embedding for words with similar contexts, for instance a noun in singular and its plural, or two synonyms.

Glove

The main intuition underlying the Glove model is the simple observation that ratios of word-word co-occurrence probabilities have the potential for encoding some form of meaning. In other words the embeddings are based on the computation of distances between pairs of target words. The model computes the distance between two target words in a text by analyzing the co-occurence of those two target words with some other probe words (contextual words).

https://nlp.stanford.edu/projects/glove/

For example, consider the co-occurrence probabilities for target words "ice" and "steam" with various probe words from the vocabulary. Here are some actual probabilities from a 6 billion word corpus:

As one might expect, "ice" co-occurs more frequently with "solid" than it does with "gas", whereas "steam" co-occurs more frequently with "gas" than it does with "solid". Both words co-occur with their shared property "water" frequently, and both co-occur with the unrelated word "fashion" infrequently. Only in the ratio of probabilities does noise from non-discriminative words like "water" and "fashion" cancel out, so that large values (much greater than 1) correlate well with properties specific to "ice", and small values (much less than 1) correlate well with properties specific of "steam". In this way, the ratio of probabilities encodes some crude form of meaning associated with the abstract concept of thermodynamic phase.

Also, Glove is very good at analogy, and performs well on the word2vec dataset.

Fine-tuning Glove Embeddings

2 Answers