4
votes

I would like to know more details about the merge mode when using Bidirectional LSTM for sequence classification, and especially for the "Concat" merge mode which is still quite unclear to me.

From what I understood with this scheme:

enter image description here

The output y_t is computed after passing the merged results of the forward and backward layers into the sigmoid function. It seems rather intuitive for the "add","mul" and "average" merge modes but I don't understand how the output y_t is computed when 'concat' merge mode is chosen. Indeed, with this merge mode we now have a vector instead of a single value before the sidmoid function.

2

2 Answers

7
votes
  1. In Bi-LSTM you will have one LSTM unrolling from left to right (say LSTM1) on the input (say X) and another LSTM unrolling form right to left (say LSTM2).
  2. Assuming that your input size (X.shape) is n X t X f where
    • n:Batch size
    • tsequence length/time-steps/no:of unrollings)
    • f:No:Of feature per time-step
  3. Assume that we have a model with single Bi-LSTM defined as below
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(t, f)))
  1. In this case LSTM1 will return output of size n X t X 10 and LSTM2 will return output of size n X t X 10
  2. Now you have below choices of how to combine the output of LSMT1 and LSTM2 at each time-step using merge_mode

sum: Add LSTM1 output to LSTM2 at each timestep. ie. n X t X 10 of LSTM1 + n X t X 10 of LSTM2 = output of size n X t X 10

mul: Element wise multiplication of LSTM1 output to LSTM2 at each timestep which will result in output of size n X t X 10

concat: Element wise concatenation of LSTM1 output to LSTM2 at each timestep which will result in output of size n X t X 10*2

ave: Element wise average of LSTM1 output to LSTM2 at each timestep which will result in output of size n X t X 10

None: Return LSTM1 and LSTM2 outputs as list

No activation function is applied after combining the outputs based on merge_mode. If you want to apply an activation you will have to explicitily define so in the model as a layer.

Test code

model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 15), merge_mode='concat'))
assert model.layers[-1].output_shape == (None, 5, 20)

model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 15), merge_mode='sum'))
assert model.layers[-1].output_shape == (None, 5, 10)

model = Sequential()
model.add(Bidirectional(LSTM(10, return_sequences=True), input_shape=(5, 15), merge_mode='mul'))
assert model.layers[-1].output_shape == (None, 5, 10)

Note:

You cannot use merge_mode=None inside a sequence model because each layer should return a tensor but None returns a list so you can't stack it up in a model. However you can use it inside functional API of keras.

1
votes

It is quite simple. Imagine that your forward LSTM layer returned a state like [0.1, 0.2, 0.3] and backward LSTM layer yielded [0.4, 0.5, 0.6]. Then concatenation (concat for brevity) is [0.1, 0.2, 0.3, 0.4, 0.5, 0.6], which is passed further to activation layer.