Prepading vs postpading inputs for Bidirectional LSTM

Question

As my inputs are of variable length , I need to pad them all to get them to same size so as to feed them to Bidirectional LSTM.

But, what difference can prepading make over postpadding.

for example:

 input [3,2,1,2]
 prepad [0,0,0,3,2,1,2]   
 postpad [3,2,1,2,0,0,0]

which varient helps in better gradient flow?

I don't know why this question has been downvoted. It's a good question. Maybe you should specifically point out that you're talking about Keras' Bidirectional layer. — z0r
This issue on GitHub asks a similar question but it doesn't have an answer. — z0r

critop critop · Accepted Answer · 2018-07-10T15:54:20

Usually a recurrent network has a higher emphasize on information it has seen last. Therefore, whether you should use pre- or post-padding highly depends on your data and problem.

Consider the following example: You have an encoder-decoder architecture. The encoder reads the data and outputs some fixed dimensional representation while the decoder should do the reverse. Now for the encoder it would make sense to pre-pad the input so it doesn't just read paddings at the end while forgetting the actual meaningful content it has seen before. For the decoder on the other hand, post-padding might be better as it should probably learn to produce some end-of-sequence-token at the end and ignore the rest (the paddings) that follows anyway.

What is now better suited for a Bidirectional-LSTM is hard to say and might also depend on the actual problem in the end. In the most simple case, it shouldn't really matter.

Prepading vs postpading inputs for Bidirectional LSTM

1 Answers