Well, I think it is better to reshape your data to (time, lats, lons, features)
, i.e. it is a timeseries of mutli-channel (i.e. features) spatial maps:
data = np.transpose(data, [3, 1, 2, 0])
Then you can easily wrap Conv2D
and MaxPooling2D
layers inside a TimeDistributed
layer to process the (multi-channel) maps at each timestep:
num_steps = 50
lats = 128
lons = 128
features = 4
out_feats = 3
model = Sequential()
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same'),
input_shape=(num_steps, lats, lons, features)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
So far we would have a tensor of shape of (50, 16, 16, 32)
. Then we can use Flatten
layer (of course, wrapped in a TimeDistributed
layer to not lose time axis) and feed the result to one or multiple LSTM layers (with return_sequence=True
to get the output at each timestep):
model.add(TimeDistributed(Flatten()))
# you may stack multiple LSTM layers on top of each other here
model.add(LSTM(units=64, return_sequences=True))
Then we need to go back up. So we need to first reshape the result of LSTM layers to make it 2D and then use the combination of UpSampling2D
and Conv2D
layers to get the original map's shape back:
model.add(TimeDistributed(Reshape((8, 8, 1))))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(32, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(32, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(16, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(out_feats, (3,3), padding='same')))
Here is the model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_132 (TimeDi (None, 50, 128, 128, 16) 592
_________________________________________________________________
time_distributed_133 (TimeDi (None, 50, 64, 64, 16) 0
_________________________________________________________________
time_distributed_134 (TimeDi (None, 50, 64, 64, 32) 4640
_________________________________________________________________
time_distributed_135 (TimeDi (None, 50, 32, 32, 32) 0
_________________________________________________________________
time_distributed_136 (TimeDi (None, 50, 32, 32, 32) 9248
_________________________________________________________________
time_distributed_137 (TimeDi (None, 50, 16, 16, 32) 0
_________________________________________________________________
time_distributed_138 (TimeDi (None, 50, 8192) 0
_________________________________________________________________
lstm_13 (LSTM) (None, 50, 64) 2113792
_________________________________________________________________
time_distributed_139 (TimeDi (None, 50, 8, 8, 1) 0
_________________________________________________________________
time_distributed_140 (TimeDi (None, 50, 16, 16, 1) 0
_________________________________________________________________
time_distributed_141 (TimeDi (None, 50, 16, 16, 32) 320
_________________________________________________________________
time_distributed_142 (TimeDi (None, 50, 32, 32, 32) 0
_________________________________________________________________
time_distributed_143 (TimeDi (None, 50, 32, 32, 32) 9248
_________________________________________________________________
time_distributed_144 (TimeDi (None, 50, 64, 64, 32) 0
_________________________________________________________________
time_distributed_145 (TimeDi (None, 50, 64, 64, 16) 4624
_________________________________________________________________
time_distributed_146 (TimeDi (None, 50, 128, 128, 16) 0
_________________________________________________________________
time_distributed_147 (TimeDi (None, 50, 128, 128, 3) 435
=================================================================
Total params: 2,142,899
Trainable params: 2,142,899
Non-trainable params: 0
_________________________________________________________________
As you can see we have a output tensor of shape (50, 128, 128, 3)
where 3 refers to number of desired labels we want to predict for location at each timestep.
Further notes:
As the number of layers and parameters increases (i.e. the model becomes deeper), you may need to deal with problems such as vanishing gradient (1, 2) and overfitting (1, 2, 3). One solution for the former is to use BatchNormalization
layer right after each (trainable) layer to ensure that the data being fed to next layer is normalized. To prevent overfitting you could use Dropout
layers (and/or set dropout
and recurrent_dropout
arguments in LSTM
layer).
As you can see above, I have assumed that we are feeding the model a timeseries of length 50. This is concerned with data preprocessing step where you need to create windowed training (and test) samples from your whole (long) timeseries and feed them in batches to your model for training.
As I have commented in the code, you can add multiple LSTM layers on top of each other to increase the representational capacity of the network. But be aware it may increase the training time and it make your model (much more) prone to overfitting. So do it if you have justified reasons for it (i.e. you have experimented with one LSTM layer and have not gotten good results). Alternatively, you can use GRU
layers instead, but there might be a tradeoff between representation capacity and computational cost (i.e. training time) compared to LSTM layer.
To make the output shape of the network compatible with the shape of your data, you could use a Dense
layer after the LSTM layer(s) or adjust the number of units of last LSTM layer.
Obviously, the above code is just for demonstration and you may need to tune its hyperparamters (e.g. number of layers, number of filters, kernel size, optimizer used, activation functions, etc.) and experiment (a lot!) to achieve a final working model with great accuracy.
If you are training on a GPU, you can use CuDNNLSTM
(CuDNNGRU
) layer instead of LSTM
(GRU) to increase training speed as it is has been optimized for GPUs.
And don't forget to normalize the training data (it's very important and helps training process a lot).
(num_grids, features, lats, lons, time)
? – today(4, 180, 360, 100)
? That would be too little data. How many timesteps are there then? Maybe the length of timeseries is too long?! – today