7
votes

On https://keras.io/layers/recurrent/ I see that LSTM layers have a kernel and a recurrent_kernel. What is their meaning? In my understanding, we need weights for the 4 gates of an LSTM cell. However, in keras implementation, kernel has a shape of (input_dim, 4*units) and recurrent_kernel has a shape of (units, 4*units). So, are both of them somehow implementing the gates?

1

1 Answers

14
votes

Correct me if I'm wrong, but if you take a look at the LSTM equations:

enter image description here

You have 4 W matrices that transform the input and 4 U matrices that transform the hidden state.

Keras saves these sets of 4 matrices into the kernel and recurrent_kernel weight arrays. From the code that uses them:

self.kernel_i = self.kernel[:, :self.units]
self.kernel_f = self.kernel[:, self.units: self.units * 2]
self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
self.kernel_o = self.kernel[:, self.units * 3:]

self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]

Apparently the 4 matrices are stored inside the weight arrays concatenated along the second dimension, which explains the weight array shapes.