How to invert a PyTorch Embedding?

Question

I have an multi-task encoder/decoder model in PyTorch with a (trainable) torch.nn.Embedding embedding layer at the input.

In one particular task, I'd like to pre-train the model self-supervised (to re-construct masked input data) and use it for inference (to fill in gaps in data).

I guess for training time I can just measure loss as the distance between the input embedding and the output embedding... But for inference, how do I invert an Embedding to reconstruct the proper category/token the output corresponds to? I can't see e.g. a "nearest" function on the Embedding class...

To invert an Embedding to reconstruct the proper category/token the output corresponds to, you'd usually add a classifier over the output embedding (e.g. with a softmax) to find the predicted token or class. — stackoverflowuser2010

Szymon Maszke Szymon Maszke · Accepted Answer · 2020-10-25T16:33:08

You can do it quite easily:

import torch

embeddings = torch.nn.Embedding(1000, 100)
my_sample = torch.randn(1, 100)
distance = torch.norm(embeddings.weight.data - my_sample, dim=1)
nearest = torch.argmin(distance)

Assuming you have 1000 tokens with 100 dimensionality this would return nearest embedding based on euclidean distance. You could also use other metrics in similar manner.

How to invert a PyTorch Embedding?

1 Answers