As my inputs are of variable length , I need to pad them all to get them to same size so as to feed them to Bidirectional LSTM.
But, what difference can prepading make over postpadding.
for example:
input [3,2,1,2]
prepad [0,0,0,3,2,1,2]
postpad [3,2,1,2,0,0,0]
which varient helps in better gradient flow?