2
votes

I trained a Glove model in python using Maciejkula's implementation (github repo). For the next step I need a word-to-embedding dictionary. However I can't seem to find an easy way to extract such a dictionary from the glove model I trained.

I can extract the embeddings by accessing model.word_vectors but this only returns an array containing the vectors without a mapping to the corresponding words. There is also the model.dictionary attribute containing word-to-index pairs. I thought that these indexes might correspond to the embedding-indexes in the model.word_vectors array, but I'm not sure that this is correct.

Do the indexes correspond or is there another easy way to get a word-to-embedding dictionary from a glove-python model?

I realize that Sanj asked I similar although wider question, but since there is no response yet I thought I'd ask this more specific question.

1

1 Answers

1
votes

You are on the right track. NLP solutions usually avoid keeping the words throughout the algorithms and use an indexing scheme, word -> idx and this idx is used in the algorithm throughout for simplicity.

For this glove implementation, model.dictionary contains word -> idx while model.word_vectors contains idx -> vectors.

e.g. to get the vector corresponding to word 'samsung', you might use:

model.word_vectors[model.dictionary['samsung']]