- In Bi-LSTM you will have one LSTM unrolling from left to right (say LSTM1) on the
input (say X) and another LSTM unrolling form right to left (say LSTM2).
- Assuming that your input size (X.shape) is
n X t X f
where
n
:Batch size
t
sequence length/time-steps/no:of unrollings)
f
:No:Of feature per time-step
- Assume that we have a model with single
Bi-LSTM
defined as below
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(t, f)))
- In this case LSTM1 will return output of size
n X t X 10
and LSTM2 will return output of size n X t X 10
- Now you have below choices of how to combine the output of LSMT1 and LSTM2 at each time-step using
merge_mode
sum: Add LSTM1 output to LSTM2 at each timestep. ie. n X t X 10
of LSTM1 + n X t X 10
of LSTM2 = output of size n X t X 10
mul: Element wise multiplication of LSTM1 output to LSTM2 at each timestep which will result in output of size n X t X 10
concat: Element wise concatenation of LSTM1 output to LSTM2 at each timestep which will result in output of size n X t X 10*2
ave: Element wise average of LSTM1 output to LSTM2 at each timestep which will result in output of size n X t X 10
None: Return LSTM1 and LSTM2 outputs as list
No activation function is applied after combining the outputs based on merge_mode
. If you want to apply an activation you will have to explicitily define so in the model as a layer.
Test code
model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 15), merge_mode='concat'))
assert model.layers[-1].output_shape == (None, 5, 20)
model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 15), merge_mode='sum'))
assert model.layers[-1].output_shape == (None, 5, 10)
model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 15), merge_mode='mul'))
assert model.layers[-1].output_shape == (None, 5, 10)
Note:
You cannot use merge_mode=None
inside a sequence model because each layer should return a tensor but None
returns a list so you can't stack it up in a model. However you can use it inside functional API of keras.