19
votes

I would like to use 1D-Conv layer following by LSTM layer to classify a 16-channel 400-timestep signal.

The input shape is composed of:

  • X = (n_samples, n_timesteps, n_features), where n_samples=476, n_timesteps=400, n_features=16 are the number of samples, timesteps, and features (or channels) of the signal.

  • y = (n_samples, n_timesteps, 1). Each timestep is labeled by either 0 or 1 (binary classification).

I use the 1D-Conv to extract the temporal information, as shown in the figure below. F=32 and K=8 are the filters and kernel_size. 1D-MaxPooling is used after 1D-Conv. 32-unit LSTM is used for signal classification. The model should return a y_pred = (n_samples, n_timesteps, 1).

enter image description here

The code snippet is shown as follow:

input_layer = Input(shape=(dataset.n_timestep, dataset.n_feature))
conv1 = Conv1D(filters=32,
               kernel_size=8,
               strides=1,
               activation='relu')(input_layer)
pool1 = MaxPooling1D(pool_size=4)(conv1)
lstm1 = LSTM(32)(pool1)
output_layer = Dense(1, activation='sigmoid')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer) 

The model summary is shown below:

enter image description here

However, I got the following error:

ValueError: Error when checking target: expected dense_15 to have 2 dimensions, but got array with shape (476, 400, 1).

I guess the problem was the incorrect shape. Please let me know how to fix it.

Another question is the number of timesteps. Because the input_shape is assigned in the 1D-Conv, how can we let LSTM know the timestep must be 400?


I would like to add the model graph based on the suggestion of @today. In this case, the timestep of LSTM will be 98. Do we need to use TimeDistributed in this case? I failed to apply the TimeDistributed in the Conv1D.

enter image description here

Is there anyway to perform the convolution among channels, instead of timesteps? For example, a filter (2, 1) traverses each timestep, as shown in figure below. enter image description here

Thanks.

2
Could it be that you need to use "TimeDistributed(Dense(1" instead of "Dense(1" ?swiftg
to answer the last part of your question. theoretically convolution reduce the input by the a certain factor due to the nature of the mathematical operation. to counter this you need to use padding. i..e set padding in CONV1D padding='same'sgDysregulation
@GurmeetSingh To apply TimeDistributed the return_sequences argument of LSTM layer must be equal to True. Even after doing this TimeDistributed(Dense(1)) is the same as Dense(1).today

2 Answers

8
votes

If you want to predict one value for each timestep, two slightly different solutions come to my mind:

1) Remove the MaxPooling1D layer, add the padding='same' argument to Conv1D layer and add return_sequence=True argument to LSTM so that the LSTM returns the output of each timestep:

from keras.layers import Input, Dense, LSTM, MaxPooling1D, Conv1D
from keras.models import Model

input_layer = Input(shape=(400, 16))
conv1 = Conv1D(filters=32,
               kernel_size=8,
               strides=1,
               activation='relu',
               padding='same')(input_layer)
lstm1 = LSTM(32, return_sequences=True)(conv1)
output_layer = Dense(1, activation='sigmoid')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer)

model.summary()

The model summary would be:

Layer (type)                 Output Shape              Param #   
=================================================================
input_4 (InputLayer)         (None, 400, 16)           0         
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 400, 32)           4128      
_________________________________________________________________
lstm_4 (LSTM)                (None, 400, 32)           8320      
_________________________________________________________________
dense_4 (Dense)              (None, 400, 1)            33        
=================================================================
Total params: 12,481
Trainable params: 12,481
Non-trainable params: 0
_________________________________________________________________

2) Just change the number of units in the Dense layer to 400 and reshape y to (n_samples, n_timesteps):

from keras.layers import Input, Dense, LSTM, MaxPooling1D, Conv1D
from keras.models import Model

input_layer = Input(shape=(400, 16))
conv1 = Conv1D(filters=32,
               kernel_size=8,
               strides=1,
               activation='relu')(input_layer)
pool1 = MaxPooling1D(pool_size=4)(conv1)
lstm1 = LSTM(32)(pool1)
output_layer = Dense(400, activation='sigmoid')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer)

model.summary()

The model summary would be:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_6 (InputLayer)         (None, 400, 16)           0         
_________________________________________________________________
conv1d_6 (Conv1D)            (None, 393, 32)           4128      
_________________________________________________________________
max_pooling1d_5 (MaxPooling1 (None, 98, 32)            0         
_________________________________________________________________
lstm_6 (LSTM)                (None, 32)                8320      
_________________________________________________________________
dense_6 (Dense)              (None, 400)               13200     
=================================================================
Total params: 25,648
Trainable params: 25,648
Non-trainable params: 0
_________________________________________________________________

Don't forget that in both cases you must use 'binary_crossentropy' (not 'categorical_crossentropy') as the loss function. I expect this solution to have a lower accuracy than the solution #1; but you must experiment with both and try to change the parameters since it entirely depends on the specific problem you are trying to solve and the nature of the data you have.


Update:

You asked for a convolution layer that only covers one timestep and k adjacent features. Yes, you can do it using a Conv2D layer:

# first add an axis to your data
X = np.expand_dims(X)   # now X has a shape of (n_samples, n_timesteps, n_feats, 1)

# adjust input layer shape ...
conv2 = Conv2D(n_filters, (1, k), ...)   # covers one timestep and k features
# adjust other layers according to the output of convolution layer...

Although I have no idea why you are doing this, to use the output of the convolution layer (which is (?, n_timesteps, n_features, n_filters), one solution is to use a LSTM layer which is wrapped inside a TimeDistributed layer. Or another solution is to flatten the last two axis.

1
votes

The input and output shape are (476, 400, 16) and (476, 1) - which means that it is just outputting one value per full sequence.

Your LSTM is not returing sequences (return_sequences = False). But even if you do the Conv1D and MaxPooling before the LSTM will squeeze the input. So LSTM itself is going to get a sample of (98,32).

I assume you want one output for each input step.

Assuming that Conv1D and MaxPooling are relavent for the input data, you can try a seq to seq approach where you give the output of the first N/w to another network to get back 400 outputs.

I recommend you look at some models like encoder decoder seq2seq networks as below

https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html

https://machinelearningmastery.com/define-encoder-decoder-sequence-sequence-model-neural-machine-translation-keras/