Troubleshooting Keras with LSTM and CNN for time series classification

Question

I have been trying to replicate the previous question on combining LSTM with CNN : How to combine LSTM and CNN in timeseries classification

However, for one reason or another, my val_accuracy is stuck at 0.4166 since the first epoch.

Funnily, no matter the model architecture, this value is roughly the same. This makes me think there is something wrong somewhere but I don't know where to start troubleshooting.

Some background on the data:

Multivariate time series (5 time steps x 20 features) data with 3 possible classes.
Input shapes for training/validation/test sets are (180000, 5,20) / (60000,5,20) / (60000,5,20).
The X training set was standardized using sklearn StandardScaler and then transformed on the validation and test sets. The y labels were one-hot encoded.

Example model using LSTM and CNN:

model = keras.Sequential()
model.add(keras.layers.LSTM(200, return_sequences=True, 
                            input_shape=(X_train_scaled.shape[1], X_train_scaled.shape[2]) ))

model.add(keras.layers.Conv1D(200, kernel_size=3, activation = 'relu'))
model.add(keras.layers.GlobalMaxPooling1D())
model.add(keras.layers.Dense(100))
model.add(keras.layers.Dense(y_train.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['acc'])

Output of fit function on the model:

Epoch 1/20
2828/2828 [==============================] - 115s 40ms/step - loss: 1.0861 - acc: 0.4100 - val_loss: 1.0836 - val_acc: 0.4166
Epoch 2/20
2828/2828 [==============================] - 108s 38ms/step - loss: 1.0837 - acc: 0.4164 - val_loss: 1.0838 - val_acc: 0.4166
Epoch 3/20
2828/2828 [==============================] - 114s 40ms/step - loss: 1.0828 - acc: 0.4184 - val_loss: 1.0833 - val_acc: 0.4165
Epoch 4/20
2828/2828 [==============================] - 111s 39ms/step - loss: 1.0830 - acc: 0.4175 - val_loss: 1.0837 - val_acc: 0.4166
Epoch 5/20
2828/2828 [==============================] - 74s 26ms/step - loss: 1.0834 - acc: 0.4161 - val_loss: 1.0835 - val_acc: 0.4164

EDIT: after looking more carefully into my data, I now have something like this:

Epoch 1/20
2828/2828 [==============================] - 129s 45ms/step - loss: 0.9560 - acc: 0.5143 - val_loss: 0.9044 - val_acc: 0.5479
Epoch 2/20
2828/2828 [==============================] - 131s 46ms/step - loss: 0.8977 - acc: 0.5520 - val_loss: 0.8937 - val_acc: 0.5527
Epoch 3/20
2828/2828 [==============================] - 116s 41ms/step - loss: 0.8887 - acc: 0.5559 - val_loss: 0.8982 - val_acc: 0.5519
Epoch 4/20
2828/2828 [==============================] - 95s 33ms/step - loss: 0.8820 - acc: 0.5616 - val_loss: 0.8834 - val_acc: 0.5606
Epoch 5/20
2828/2828 [==============================] - 100s 35ms/step - loss: 0.8786 - acc: 0.5624 - val_loss: 0.8823 - val_acc: 0.5580
Epoch 6/20
2828/2828 [==============================] - 82s 29ms/step - loss: 0.8728 - acc: 0.5661 - val_loss: 0.8797 - val_acc: 0.5628
Epoch 7/20
2828/2828 [==============================] - 120s 42ms/step - loss: 0.8723 - acc: 0.5679 - val_loss: 0.8744 - val_acc: 0.5677
Epoch 8/20
2828/2828 [==============================] - 158s 56ms/step - loss: 0.8686 - acc: 0.5670 - val_loss: 0.8733 - val_acc: 0.5679
Epoch 9/20
2828/2828 [==============================] - 146s 51ms/step - loss: 0.8646 - acc: 0.5714 - val_loss: 0.8764 - val_acc: 0.5667
Epoch 10/20
2828/2828 [==============================] - 134s 47ms/step - loss: 0.8632 - acc: 0.5720 - val_loss: 0.8715 - val_acc: 0.5701
Epoch 11/20
2828/2828 [==============================] - 141s 50ms/step - loss: 0.8612 - acc: 0.5734 - val_loss: 0.8721 - val_acc: 0.5694
Epoch 12/20
2828/2828 [==============================] - 151s 53ms/step - loss: 0.8582 - acc: 0.5753 - val_loss: 0.8690 - val_acc: 0.5713
Epoch 13/20
2828/2828 [==============================] - 137s 49ms/step - loss: 0.8554 - acc: 0.5792 - val_loss: 0.8694 - val_acc: 0.5699
Epoch 14/20
2828/2828 [==============================] - 121s 43ms/step - loss: 0.8541 - acc: 0.5779 - val_loss: 0.8709 - val_acc: 0.5691
Epoch 15/20
2828/2828 [==============================] - 134s 47ms/step - loss: 0.8476 - acc: 0.5826 - val_loss: 0.8643 - val_acc: 0.5766
Epoch 16/20
2828/2828 [==============================] - 137s 48ms/step - loss: 0.8453 - acc: 0.5838 - val_loss: 0.8664 - val_acc: 0.5742
Epoch 17/20
2828/2828 [==============================] - 152s 54ms/step - loss: 0.8409 - acc: 0.5872 - val_loss: 0.8716 - val_acc: 0.5683
Epoch 18/20
2828/2828 [==============================] - 150s 53ms/step - loss: 0.8391 - acc: 0.5892 - val_loss: 0.8663 - val_acc: 0.5726
Epoch 19/20
2828/2828 [==============================] - 133s 47ms/step - loss: 0.8341 - acc: 0.5920 - val_loss: 0.8687 - val_acc: 0.5766
Epoch 20/20
2828/2828 [==============================] - 117s 41ms/step - loss: 0.8331 - acc: 0.5913 - val_loss: 0.8643 - val_acc: 0.5764

Timbus Calin Timbus Calin · Accepted Answer · 2021-08-18T15:08:50

The initial tendency to work at the model architecture is good, yet the real issue in your case most likely lies in how your dataset is built.

The constant accuracy that you are having is, in all likelihood, because the network predicts only one class (practically one of your classes has a 41% target distribution).

My suggestion is to focus on the data, not the architecture of the model. Switching from 100 units to 200 will not solve your problem; instead what I would still do is try 'adam' optimizer by default and finetune the learning rate.

That being said, given the problem you have, do you have enough timesteps for this particular classification you are trying to make? It is likely that 5 timesteps are not sufficient for your model to capture a pattern on your dataset.

Are all the features really necessary? You could try a RandomForest/XGBoost to eliminate all the features which really do not have a big contribution to the dependent variable (y).

Start with reiterating on the dataset and on the task you are trying to solve. Ensure that the time series data in itself makes sense and is not pure noie. Try overfitting on a small portion of dataset, and only then proceed with the entire dataset training.

Troubleshooting Keras with LSTM and CNN for time series classification

1 Answers