Attention Layer throwing TypeError: Permute layer does not support masking in Keras

Question

I have been following this post in order to implement attention layer over my LSTM model.

Code for the attention layer:

INPUT_DIM = 2
TIME_STEPS = 20
SINGLE_ATTENTION_VECTOR = False
APPLY_ATTENTION_BEFORE_LSTM = False

def attention_3d_block(inputs):
    input_dim = int(inputs.shape[2])
    a = Permute((2, 1))(inputs)
    a = Reshape((input_dim, TIME_STEPS))(a)
    a = Dense(TIME_STEPS, activation='softmax')(a)
    if SINGLE_ATTENTION_VECTOR:
        a = Lambda(lambda x: K.mean(x, axis=1), name='dim_reduction')(a)
        a = RepeatVector(input_dim)(a)
    a_probs = Permute((2, 1), name='attention_vec')(a)
    output_attention_mul = merge(
        [inputs, a_probs],
        name='attention_mul',
        mode='mul'
    )
    return output_attention_mul

The error I get:

File "main_copy.py", line 244, in model = create_model(X_vocab_len, X_max_len, y_vocab_len, y_max_len, HIDDEN_DIM, LAYER_NUM) File "main_copy.py", line 189, in create_model attention_mul = attention_3d_block(temp) File "main_copy.py", line 124, in attention_3d_block a = Permute((2, 1))(inputs) File "/root/.virtualenvs/keras_tf/lib/python3.5/site-packages/keras/engine/topology.py", line 597, in call output_mask = self.compute_mask(inputs, previous_mask) File "/root/.virtualenvs/keras_tf/lib/python3.5/site-packages/keras/engine/topology.py", line 744, in compute_mask str(mask)) TypeError: Layer permute_1 does not support masking, but was passed an input_mask: Tensor("merge_2/All:0", shape=(?, 15), dtype=bool)

I went through this thread which says:

It is a small change in the Keras source code (set the supports_masking class variable in the Lambda layer to True instead of False). Otherwise there isn't a way to do this. Masking isn't really necessary though.

Where can I set the supports_masking variable to True? Also, is there any other solution to this?

Daniel Möller Daniel Möller · Accepted Answer · 2018-03-16T13:55:48

I'd say: don't use masking.

There is something pretty weird about that implementation that is trying to apply a Dense layer to a variable dimension (TIME_STEPS).

That would require a variable number of weights in the layer, which is simply not possible. (With masking, you'd be telling that some weights should be ignored for each different sample).

I'd say you should have a token/word in the inputs telling "this is the end of the sentence/movie/sequence" and fill the remaining length with this token. Then you turn off or remove masking wherever you used it in your model (either a parameter when you declared the Embedding layers or actual Masking layers).

Trying to change the keras native code may result in unstable behavior and wrong results (if not errors).

There is a reason for masking not being supported in such layers, mostly some reason similar to the explanation above about the Dense layer. If you change that, who knows what could go wrong? Never mess with the source code unless you're really really sure of all the consequences it may have.

If even though you want to use masking, there are some complicated solutions I found (but didn't test) around, such as this: MaskEatingLambda layer:

See the comments from @sergeyf here: https://github.com/keras-team/keras/issues/1579
See their custom layer here: https://gist.github.com/sergeyf/a95de7d089668b41decad343ee30b89e

Attention Layer throwing TypeError: Permute layer does not support masking in Keras

1 Answers