On https://keras.io/layers/recurrent/ I see that LSTM layers have a kernel
and a recurrent_kernel
. What is their meaning? In my understanding, we need weights for the 4 gates of an LSTM cell. However, in keras implementation, kernel
has a shape of (input_dim, 4*units) and recurrent_kernel
has a shape of (units, 4*units). So, are both of them somehow implementing the gates?
7
votes
1 Answers
14
votes
Correct me if I'm wrong, but if you take a look at the LSTM equations:
You have 4 W matrices that transform the input and 4 U matrices that transform the hidden state.
Keras saves these sets of 4 matrices into the kernel
and recurrent_kernel
weight arrays. From the code that uses them:
self.kernel_i = self.kernel[:, :self.units]
self.kernel_f = self.kernel[:, self.units: self.units * 2]
self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
self.kernel_o = self.kernel[:, self.units * 3:]
self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]
Apparently the 4 matrices are stored inside the weight arrays concatenated along the second dimension, which explains the weight array shapes.