17
votes

I have a tensor of size 4 x 6 where 4 is batch size and 6 is sequence length. Every element of the sequence vectors are some index (0 to n). I want to create a 4 x 6 x n tensor where the vectors in 3rd dimension will be one hot encoding of the index which means I want to put 1 in the specified index and rest of the values will be zero.

For example, I have the following tensor:

[[5, 3, 2, 11, 15, 15],
[1, 4, 6, 7, 3, 3],
[2, 4, 7, 8, 9, 10],
[11, 12, 15, 2, 5, 7]]

Here, all the values are in between (0 to n) where n = 15. So, I want to convert the tensor to a 4 X 6 X 16 tensor where the third dimension will represent one hot encoding vector.

How can I do that using PyTorch functionalities? Right now, I am doing this with loop but I want to avoid looping!

3

3 Answers

25
votes

NEW ANSWER As of PyTorch 1.1, there is a one_hot function in torch.nn.functional. Given any tensor of indices indices and a maximal index n, you can create a one_hot version as follows:

n = 5
indices = torch.randint(0,n, size=(4,7))
one_hot = torch.nn.functional.one_hot(indices, n) # size=(4,7,n)

Very old Answer

At the moment, slicing and indexing can be a bit of a pain in PyTorch from my experience. I assume you don't want to convert your tensors to numpy arrays. The most elegant way I can think of at the moment is to use sparse tensors and then convert to a dense tensor. That would work as follows:

from torch.sparse import FloatTensor as STensor

batch_size = 4
seq_length = 6
feat_dim = 16

batch_idx = torch.LongTensor([i for i in range(batch_size) for s in range(seq_length)])
seq_idx = torch.LongTensor(list(range(seq_length))*batch_size)
feat_idx = torch.LongTensor([[5, 3, 2, 11, 15, 15], [1, 4, 6, 7, 3, 3],                            
                             [2, 4, 7, 8, 9, 10], [11, 12, 15, 2, 5, 7]]).view(24,)

my_stack = torch.stack([batch_idx, seq_idx, feat_idx]) # indices must be nDim * nEntries
my_final_array = STensor(my_stack, torch.ones(batch_size * seq_length), 
                         torch.Size([batch_size, seq_length, feat_dim])).to_dense()    

print(my_final_array)

Note: PyTorch is undergoing some work currently, that will add numpy style broadcasting and other functionalities within the next two or three weeks and other functionalities. So it's possible, there'll be better solutions available in the near future.

Hope this helps you a bit.

12
votes

The easiest way I found. Where x is a list of numbers and class_count is the amount of classes you have.

def one_hot(x, class_count):
    return torch.eye(class_count)[x,:]

Use it like this:

x = [0,2,5,4]
class_count = 8
one_hot(x,class_count)
tensor([[1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 0.]])


3
votes

This can be done in PyTorch using the in-place scatter_ method for any Tensor object.

labels = torch.LongTensor([[[2,1,0]], [[0,1,0]]]).permute(0,2,1) # Let this be your current batch
batch_size, k, _ = labels.size()
labels_one_hot = torch.FloatTensor(batch_size, k, num_classes).zero_()
labels_one_hot.scatter_(2, labels, 1)

For num_classes=3 (the indices should vary from [0,3)), this will give you

(0 ,.,.) = 
  0  0  1
  0  1  0
  1  0  0
(1 ,.,.) = 
  1  0  0
  0  1  0
  1  0  0
[torch.FloatTensor of size 2x3x3]

Note that labels should be a torch.LongTensor.

PyTorch Docs Reference: torch.Tensor.scatter_