I am a little bit confused with how LSTM handle the input. As we all know, the input of LSTM model in Keras has the form (batch_size, timesteps, input_dim). My data is a time series data, where each sequence of n time steps are fed in to predict the value at n+1 time steps. Then, how do they access the input? They process each time steps in the sequence or have access to all of them at the same time? As i check the number of parameters of each LSTM layer. They have 4*d*(n+d) where n is the dimension of input and d is the number of memory cell. In my case i have d=10, and the number of parameters is 440 (without bias). So it means n=1 here, so seems like the input has dimension 1*1. Then they have access to all of them spontaniously. Anyone has some ideas about this?
1 Answers
First, think of a convolutional layer (it's easier).
It has parameters that depend only on the "filter size", "input channels" and "number of filters". But never on the "size of the image".
That happens because it's somewhat a "walking operation". The same group of filters is applied throughout the image. The total operations increase with the size of the image, but the parameters, which only define the filters, are independent from the image size. (Imagine a filter to detect a circle, this filter doesn't need to change to detect circles in different parts of the image, although it's applied for each step in the entire image).
So:
- Parameters: number of filters * size of filters² * input channels
- Calculation steps: size of image (considering strides, padding, etc.)
With LSTM layers, a similar thing happens. The parameters are related to what they call "gates". (Take a look here)
There is a "state", and "gates" that are applied in each time iteration to determine how the state will change.
The gates are not time dependent, though. The calculations are time iterations indeed, but every iteration uses the same group of gates.
Comparing to the convolutional layers:
- Parameters: number of cells, data dimension
- Calculation steps: time steps