1
votes

I am building a simple LSTM model as follows:

model = Sequential()
model.add(LSTM(10, return_sequences = False, input_shape = (8, 8)))
model.add(Activation('softmax'))

model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])

Here, my input is a ndarray of shape (8,8). From the trained model out of this network, when I dump out the weights, I get the values as:

print(model.layers.layer[0].get_weights[0].shape) # W [W_i, W_f, W_c, W_o]
print(model.layers.layer[0].get_weights[1].shape) # U
print(model.layers.layer[0].get_weights[2].shape) # b

Outputs:

(8, 40)
(10, 40)
(40,)

W is a combined matrix of W_i, W_f, W_c and W_o each with (8, 10). But this doesn't match with the equation:

f_t = sigmoid( W_f * x + U_f * h_{t-1} + b_f )

If I take just the matrix dimension of the above equation, it goes like this:

W_f' * x + U_f' * h_{t-1} + b_f 
    --> [10, 8] x [8, 8] + [10, 10] x [10, 1] + [10, 1] 
    --> [10, 8] + [10, 1] + [10, 1]

So looking at the above equation, it seems the shape of X(input_tensor) is incorrect. Only vector input shape seems to be fitting the above equation. Can someone help me understand the above equation with input shape as 2-D?

TIA

1

1 Answers

1
votes

The equation you mentioned is for computing the output for t-th timestep. Therefore, only the input at timestep t is used (i.e. x_t) and not all the inputs (i.e. x):

f_t = sigmoid( W_f * x_{t} + U_f * h_{t-1} + b_f )

As a result we would have:

W_f' * x + U_f' * h_{t-1} + b_f 
    --> [10, 8] x [8, 1] + [10, 10] x [10, 1] + [10, 1] 
    --> [10, 1] + [10, 1] + [10, 1]
    --> [10, 1] # output at timestep t

And this is in harmony with what LSTM layers are meant to do: they get the input at timestep t and give an output based on the that input and the state resulted from processing the first to (t-1)-th timesteps.