How padded sequences given as packed sequences are dealt by RNN in pytorch?

Question

In pytorch, we can give a packed sequence as an input to the RNN. From official doc, input of an RNN can be as follows.

input (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.

Example

packed = torch.nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)
outputs, hidden = self.rnn(packed, hidden)
outputs, output_lengths = torch.nn.utils.rnn.pad_packed_sequence(outputs)

Here, embedded is the embedded representation of a batch input.

My question is, how the computation is carried out for packed sequences in RNN? How the hidden states are computed for padded sequences in a batch through packed representation?

Crystina Crystina · Accepted Answer · 2020-03-28T23:47:45

For the second question: hidden states at padded sequences will not be computed.

To answer how is that happening, let's first see what pack_padded_sequence does for us:

from torch.nn.utils.rnn import pad_sequence, pad_packed_sequence, pack_padded_sequence

raw = [ torch.ones(25, 300) / 2, 
        torch.ones(22, 300) / 2.3, 
        torch.ones(15, 300) / 3.2 ]
padded = pad_sequence(raw)  # size: [25, 3, 300]

lengths = torch.as_tensor([25, 22, 15], dtype=torch.int64)
packed = pack_padded_sequence(padded, lengths)

so far we randomly created a three tensor with different length (timestep in the context of RNN) , and we first pad them to the same length, then packed it. Now if we run

print(padded.size())
print(packed.data.size()) # packed.data refers to the "packed" tensor

we will see:

torch.Size([25, 3, 300])
torch.Size([62, 300])

Obviously 62 does not come from 25 * 3. So what pack_padded_sequence does is only keep the meaningful timestep of each batch entry according to the lengths tensor we passed to pack_padded_sequence (i.e. if we passed [25, 25, 25] to it, the size of packed.data would still be [75, 300] even though the raw tensor does not change). In short, rnn would no even see the pad timestep with pack_padded_sequence

And now let's see what's the difference after we pass padded and packed to rnn

rnn = torch.nn.RNN(input_size=300, hidden_size=2)
padded_outp, padded_hn = rnn(padded) # size: [25, 3, 2] / [1, 3, 2]
packed_outp, packed_hn = rnn(packed) # 'PackedSequence' Obj / [1, 3, 2]
undo_packed_outp, _ = pad_packed_sequence(packed_outp)

# return "h_n"
print(padded_hn) # tensor([[[-0.2329, -0.6179], [-0.1158, -0.5430],[ 0.0998, -0.3768]]]) 
print(packed_hn) # tensor([[[-0.2329, -0.6179], [ 0.5622,  0.1288], [ 0.5683,  0.1327]]]

# the output of last timestep (the 25-th timestep)
print(padded_outp[-1]) # tensor([[[-0.2329, -0.6179], [-0.1158, -0.5430],[ 0.0998, -0.3768]]]) 
print(undo_packed_outp.data[-1]) # tensor([[-0.2329, -0.6179], [ 0.0000,  0.0000], [ 0.0000,  0.0000]]

The values of padded_hn and packed_hn are different since rnn DOES compute the pad for padded yet not for the 'packed' (PackedSequence object), which also can be observed from the last hidden state: all three batch entry in padded got non-zero last hidden state even if its length is less than 25. But for packed, the last hidden state for shorter data is not computed (i.e. 0)

p.s. another observation:

print([(undo_packed_outp[:, i, :].sum(-1) != 0).sum() for i in range(3)])

would give us [tensor(25), tensor(22), tensor(15)], which align to the actual length of our input.

How padded sequences given as packed sequences are dealt by RNN in pytorch?

2 Answers