
I want to train a model to predict one's emotion from the physical signals. I have a physical signal and using it as input feature;


In my dataset, there are 312 total records belonging to the participants and there are 18000 rows of data in each record. So when I combine them into a single data frame, there are 5616000 rows in total.

Here is my train_x dataframe;

0        0.1912 
1        0.3597 
2        0.3597 
3        0.3597 
4        0.3597 
5        0.3597 
6        0.2739 
7        0.1641 
8        0.0776 
9        0.0005 
10      -0.0375 
11      -0.0676 
12      -0.1071 
13      -0.1197 
..      ....... 
..      ....... 
..      ....... 
5616000 0.0226  

And I have 6 classes which are corresponding to emotions. I have encoded these labels with numbers;

anger = 0, calmness = 1, disgust = 2, fear = 3, happiness = 4, sadness = 5

Here is my train_y;

0              0
1              0
2              0
3              0
4              0
.              .
.              .
.              .
18001          1
18002          1
18003          1
.              .
.              .
.              .
360001         2
360002         2
360003         2
.              .
.              .
.              .
.              .
5616000        5

To feed my CNN, I am reshaping the train_x and one hot encoding the train_y data.

train_x = train_x.values.reshape(312,18000,1) 
train_y = train_y.values.reshape(312,18000)
train_y = train_y[:,:1]  # truncated train_y to have single corresponding value to a complete signal.
train_y = pd.DataFrame(train_y)
train_y = pd.get_dummies(train_y[0]) #one hot encoded labels

After these processes, here is how they look like; train_x after reshape;



  [1.        ]
  [1.        ]


  [0.7756791 ]
  [0.7756791 ]

  [0.0391334 ]
  [0.0391334 ]
  [0.0578706 ]]

 [[0.5786066 ]
  [0.4112712 ]]]

train_y after one hot encoding;

    0  1  2  3  4  5
0    1  0  0  0  0  0
1    1  0  0  0  0  0
2    0  1  0  0  0  0
3    0  1  0  0  0  0
4    0  0  0  0  0  1
5    0  0  0  0  0  1
6    0  0  1  0  0  0
7    0  0  1  0  0  0
8    0  0  0  1  0  0
9    0  0  0  1  0  0
10   0  0  0  0  1  0
11   0  0  0  0  1  0
12   0  0  0  1  0  0
13   0  0  0  1  0  0
14   0  1  0  0  0  0
15   0  1  0  0  0  0
16   1  0  0  0  0  0
17   1  0  0  0  0  0
18   0  0  1  0  0  0
19   0  0  1  0  0  0
20   0  0  0  0  1  0
21   0  0  0  0  1  0
22   0  0  0  0  0  1
23   0  0  0  0  0  1
24   0  0  0  0  0  1
25   0  0  0  0  0  1
26   0  0  1  0  0  0
27   0  0  1  0  0  0
28   0  1  0  0  0  0
29   0  1  0  0  0  0
..  .. .. .. .. .. ..
282  0  0  0  1  0  0
283  0  0  0  1  0  0
284  1  0  0  0  0  0
285  1  0  0  0  0  0
286  0  0  0  0  1  0
287  0  0  0  0  1  0
288  1  0  0  0  0  0
289  1  0  0  0  0  0
290  0  1  0  0  0  0
291  0  1  0  0  0  0
292  0  0  0  1  0  0
293  0  0  0  1  0  0
294  0  0  1  0  0  0
295  0  0  1  0  0  0
296  0  0  0  0  0  1
297  0  0  0  0  0  1
298  0  0  0  0  1  0
299  0  0  0  0  1  0
300  0  0  0  1  0  0
301  0  0  0  1  0  0
302  0  0  1  0  0  0
303  0  0  1  0  0  0
304  0  0  0  0  0  1
305  0  0  0  0  0  1
306  0  1  0  0  0  0
307  0  1  0  0  0  0
308  0  0  0  0  1  0
309  0  0  0  0  1  0
310  1  0  0  0  0  0
311  1  0  0  0  0  0

[312 rows x 6 columns]

After reshaping, I have created my CNN model;

model = Sequential()
model.add(Conv1D(100,700,activation='relu',input_shape=(18000,1))) #kernel_size is 700 because 18000 rows = 60 seconds so 700 rows = ~2.33 seconds and there is two heart beat peak in every 2 second for ecg signal.

adam = keras.optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)

model.compile(optimizer = adam, loss = 'categorical_crossentropy', metrics = ['acc'])
model.fit(train_x,train_y,epochs = 50, batch_size = 32, validation_split=0.33, shuffle=False)

The problem is, accuracy is not going more than 0.2 and it is fluctuating up and down. Looks like the model does not learn anything. I have tried to increase layers, play with the learning rate, changing the loss function, changing the optimizer, scaling the data, normalizing the data, but nothing helped me to solve this problem. I also tried more simple Dense models or LSTM models but I can't find a way which works.

How Can I solve this problem? Thanks in advance.


I wanted to add the training results after 50 epochs;

Epoch 1/80
249/249 [==============================] - 24s 96ms/step - loss: 2.3118 - acc: 0.1406 - val_loss: 1.7989 - val_acc: 0.1587
Epoch 2/80
249/249 [==============================] - 19s 76ms/step - loss: 2.0468 - acc: 0.1647 - val_loss: 1.8605 - val_acc: 0.2222
Epoch 3/80
249/249 [==============================] - 19s 76ms/step - loss: 1.9562 - acc: 0.1767 - val_loss: 1.8203 - val_acc: 0.2063
Epoch 4/80
249/249 [==============================] - 19s 75ms/step - loss: 1.9361 - acc: 0.2169 - val_loss: 1.8033 - val_acc: 0.1905
Epoch 5/80
249/249 [==============================] - 19s 74ms/step - loss: 1.8834 - acc: 0.1847 - val_loss: 1.8198 - val_acc: 0.2222
Epoch 6/80
249/249 [==============================] - 19s 75ms/step - loss: 1.8278 - acc: 0.2410 - val_loss: 1.7961 - val_acc: 0.1905
Epoch 7/80
249/249 [==============================] - 19s 75ms/step - loss: 1.8022 - acc: 0.2450 - val_loss: 1.8092 - val_acc: 0.2063
Epoch 8/80
249/249 [==============================] - 19s 75ms/step - loss: 1.7959 - acc: 0.2369 - val_loss: 1.8005 - val_acc: 0.2222
Epoch 9/80
249/249 [==============================] - 19s 75ms/step - loss: 1.7234 - acc: 0.2610 - val_loss: 1.7871 - val_acc: 0.2381
Epoch 10/80
249/249 [==============================] - 19s 75ms/step - loss: 1.6861 - acc: 0.2972 - val_loss: 1.8017 - val_acc: 0.1905
Epoch 11/80
249/249 [==============================] - 19s 75ms/step - loss: 1.6696 - acc: 0.3173 - val_loss: 1.7878 - val_acc: 0.1905
Epoch 12/80
249/249 [==============================] - 19s 75ms/step - loss: 1.5868 - acc: 0.3655 - val_loss: 1.7771 - val_acc: 0.1270
Epoch 13/80
249/249 [==============================] - 19s 75ms/step - loss: 1.5751 - acc: 0.3936 - val_loss: 1.7818 - val_acc: 0.1270
Epoch 14/80
249/249 [==============================] - 19s 75ms/step - loss: 1.5647 - acc: 0.3735 - val_loss: 1.7733 - val_acc: 0.1429
Epoch 15/80
249/249 [==============================] - 19s 75ms/step - loss: 1.4621 - acc: 0.4177 - val_loss: 1.7759 - val_acc: 0.1270
Epoch 16/80
249/249 [==============================] - 19s 75ms/step - loss: 1.4519 - acc: 0.4498 - val_loss: 1.8005 - val_acc: 0.1746
Epoch 17/80
249/249 [==============================] - 19s 75ms/step - loss: 1.4489 - acc: 0.4378 - val_loss: 1.8020 - val_acc: 0.1270
Epoch 18/80
249/249 [==============================] - 19s 75ms/step - loss: 1.4449 - acc: 0.4297 - val_loss: 1.7852 - val_acc: 0.1587
Epoch 19/80
249/249 [==============================] - 19s 75ms/step - loss: 1.3600 - acc: 0.5301 - val_loss: 1.7922 - val_acc: 0.1429
Epoch 20/80
249/249 [==============================] - 19s 75ms/step - loss: 1.3349 - acc: 0.5422 - val_loss: 1.8061 - val_acc: 0.2222
Epoch 21/80
249/249 [==============================] - 19s 75ms/step - loss: 1.2885 - acc: 0.5622 - val_loss: 1.8235 - val_acc: 0.1746
Epoch 22/80
249/249 [==============================] - 19s 75ms/step - loss: 1.2291 - acc: 0.5823 - val_loss: 1.8173 - val_acc: 0.1905
Epoch 23/80
249/249 [==============================] - 19s 75ms/step - loss: 1.1890 - acc: 0.6506 - val_loss: 1.8293 - val_acc: 0.1905
Epoch 24/80
249/249 [==============================] - 19s 75ms/step - loss: 1.1473 - acc: 0.6627 - val_loss: 1.8274 - val_acc: 0.1746
Epoch 25/80
249/249 [==============================] - 19s 75ms/step - loss: 1.1060 - acc: 0.6747 - val_loss: 1.8142 - val_acc: 0.1587
Epoch 26/80
249/249 [==============================] - 19s 75ms/step - loss: 1.0210 - acc: 0.7510 - val_loss: 1.8126 - val_acc: 0.1905
Epoch 27/80
249/249 [==============================] - 19s 75ms/step - loss: 0.9699 - acc: 0.7631 - val_loss: 1.8094 - val_acc: 0.1746
Epoch 28/80
249/249 [==============================] - 19s 75ms/step - loss: 0.9127 - acc: 0.8193 - val_loss: 1.8012 - val_acc: 0.1746
Epoch 29/80
249/249 [==============================] - 19s 75ms/step - loss: 0.9176 - acc: 0.7871 - val_loss: 1.8371 - val_acc: 0.1746
Epoch 30/80
249/249 [==============================] - 19s 75ms/step - loss: 0.8725 - acc: 0.8233 - val_loss: 1.8215 - val_acc: 0.1587
Epoch 31/80
249/249 [==============================] - 19s 75ms/step - loss: 0.8316 - acc: 0.8514 - val_loss: 1.8010 - val_acc: 0.1429
Epoch 32/80
249/249 [==============================] - 19s 75ms/step - loss: 0.7958 - acc: 0.8474 - val_loss: 1.8594 - val_acc: 0.1270
Epoch 33/80
249/249 [==============================] - 19s 75ms/step - loss: 0.7452 - acc: 0.8795 - val_loss: 1.8260 - val_acc: 0.1587
Epoch 34/80
249/249 [==============================] - 19s 75ms/step - loss: 0.7395 - acc: 0.8916 - val_loss: 1.8191 - val_acc: 0.1587
Epoch 35/80
249/249 [==============================] - 19s 75ms/step - loss: 0.6794 - acc: 0.9357 - val_loss: 1.8344 - val_acc: 0.1429
Epoch 36/80
249/249 [==============================] - 19s 75ms/step - loss: 0.6106 - acc: 0.9357 - val_loss: 1.7903 - val_acc: 0.1111
Epoch 37/80
249/249 [==============================] - 19s 75ms/step - loss: 0.5609 - acc: 0.9598 - val_loss: 1.7882 - val_acc: 0.1429
Epoch 38/80
249/249 [==============================] - 19s 75ms/step - loss: 0.5788 - acc: 0.9478 - val_loss: 1.8036 - val_acc: 0.1905
Epoch 39/80
249/249 [==============================] - 19s 75ms/step - loss: 0.5693 - acc: 0.9398 - val_loss: 1.7712 - val_acc: 0.1746
Epoch 40/80
249/249 [==============================] - 19s 75ms/step - loss: 0.4911 - acc: 0.9598 - val_loss: 1.8497 - val_acc: 0.1429
Epoch 41/80
249/249 [==============================] - 19s 75ms/step - loss: 0.4824 - acc: 0.9518 - val_loss: 1.8105 - val_acc: 0.1429
Epoch 42/80
249/249 [==============================] - 19s 75ms/step - loss: 0.4198 - acc: 0.9759 - val_loss: 1.8332 - val_acc: 0.1111
Epoch 43/80
249/249 [==============================] - 19s 75ms/step - loss: 0.3890 - acc: 0.9880 - val_loss: 1.9316 - val_acc: 0.1111
Epoch 44/80
249/249 [==============================] - 19s 75ms/step - loss: 0.3762 - acc: 0.9920 - val_loss: 1.8333 - val_acc: 0.1746
Epoch 45/80
249/249 [==============================] - 19s 75ms/step - loss: 0.3510 - acc: 0.9880 - val_loss: 1.8090 - val_acc: 0.1587
Epoch 46/80
249/249 [==============================] - 19s 75ms/step - loss: 0.3306 - acc: 0.9880 - val_loss: 1.8230 - val_acc: 0.1587
Epoch 47/80
249/249 [==============================] - 19s 75ms/step - loss: 0.2814 - acc: 1.0000 - val_loss: 1.7843 - val_acc: 0.2222
Epoch 48/80
249/249 [==============================] - 19s 75ms/step - loss: 0.2794 - acc: 1.0000 - val_loss: 1.8147 - val_acc: 0.2063
Epoch 49/80
249/249 [==============================] - 19s 75ms/step - loss: 0.2430 - acc: 1.0000 - val_loss: 1.8488 - val_acc: 0.1587
Epoch 50/80
249/249 [==============================] - 19s 75ms/step - loss: 0.2216 - acc: 1.0000 - val_loss: 1.8215 - val_acc: 0.1587
your validation loss doesn't change but loss decreases during training. at this point your model has probably overfitted, and you can reduce your epochs, and additionally consider simplifying the network and seeing if that helps. (disclaimer, didn't look into model code itself)Paritosh Singh
@ParitoshSingh, yes it is obviously overfitting but until it overfits, there is no satisfying result. The accuracy fluctuates all the way down. I have tried simpler models, nothing changed.Ozan Yurtsever
Frustrating one, for sure. Since validation_split just pulls from the end of each batch, could there be a per-batch pattern resulting in validation data being substantially different from training? Perhaps you've already tried shuffle = True?TheLoneDeranger
you could try regulation keras.io/regularizers to avoid overfittingPeter
Maybe the problem is with the data itself? Since you tried regularization techniques and LSTMs, perhaps there is something wrong with your data.Timbus Calin

8 Answers


I would recommend taking several steps back and consider a much simpler approach.
Based on the following...

I have tried to increase layers, play with the learning rate, changing the loss function, changing the optimizer, scaling the data, normalizing the data, but nothing helped me to solve this problem. I also tried more simple Dense models or LSTM models but I can't find a way which works.

It doesn't sound like you have as strong of an understanding of your data and your tooling... which is fine cause it's an opportunity to learn.

A few questions

  1. Do you have a base line model? Have you tried just running a multinomial logistic regression? If not I would strongly suggest starting there. Going through the feature engineering needed to make such a model will be invaluable as you increase the complexity of your model.

  2. Did you check for class imbalances?

  3. Why are you using a CNN? What do you want to accomplish with the convolutional layers? For me when I'm constructing a vision model for let say classifying the shoes in my closet I use several convolutional layers to extract spatial features such as edges and curves.

  4. Related to the third question... Where did you get this architecture from? Is it from a publication? Is this the current state of the art model for ECG traces? Or is this the most accessible model? Sometimes the two are not the same. I would dig into the literature and search the web a bit more to find some more information about neural networks and analyzing ECG traces.

I think if you can answer these questions you will be able to solve your problem yourself.


Current problem in your implementation is, as you have used data with shape of (312,18000,1) for your model, you only have 312 samples and you have used 0.33 validation split so, you are using only 209 samples for training purpose.

Layer (type)                 Output Shape              Param #   
conv1d_1 (Conv1D)            (None, 17301, 100)        70100     
conv1d_2 (Conv1D)            (None, 16602, 50)         3500050   
dropout_1 (Dropout)          (None, 16602, 50)         0         
batch_normalization_1 (Batch (None, 16602, 50)         200       
activation_1 (Activation)    (None, 16602, 50)         0         
max_pooling1d_1 (MaxPooling1 (None, 4150, 50)          0         
flatten_1 (Flatten)          (None, 207500)            0         
dense_1 (Dense)              (None, 6)                 1245006   
Total params: 4,815,356
Trainable params: 4,815,256
Non-trainable params: 100

As I seen model.summary(), your model has 4,815,256 total trainable parameters. So, your model is easily overfitting the training data. Issue is, you have so many parameters to learn without enough samples. You can try to reduce your model size as shown below:

model = Sequential()
Layer (type)                 Output Shape              Param #   
conv1d_1 (Conv1D)            (None, 17999, 100)        300       
conv1d_2 (Conv1D)            (None, 17998, 10)         2010      
dropout_1 (Dropout)          (None, 17998, 10)         0         
batch_normalization_1 (Batch (None, 17998, 10)         40        
activation_1 (Activation)    (None, 17998, 10)         0         
max_pooling1d_1 (MaxPooling1 (None, 4499, 10)          0         
flatten_1 (Flatten)          (None, 44990)             0         
dense_1 (Dense)              (None, 6)                 269946    
Total params: 272,296
Trainable params: 272,276
Non-trainable params: 20

As I know you have 3 types of data ecg, gsr, temp. So, you can use train_x as (312,18000,3). and your train_y will be (312,6).

If above solution is not working than,

  1. Plot the class distribution from your dataset and check if there is any class imbalance in data.
  2. As your model is overfitting the data, try to create more data (If this dataset is created by you) or find some data augmentation technique for this.

I believe your code is correct, but as the commenter said, you are likely overfitting your data.

You might want to plot validation accuracy and training accuracy over the epochs to visualize this.

You should first consider seeing if your overfitting issue improves with a simpler model. Note that this will not likely improve your over all performance, but your validation accuracy will more closely match whatever your training accuracy becomes. Another option would be to add a pooling layer immediately after your convolution layers.


You may try to add regularizer(s) (L1 or L2), check kernel_initializer and/or adjust the learning rate during training via callbacks. The example below is from a regression model.

model = Sequential()
model.add(Dense(128, input_dim=dims, activation='relu'))
model.add(Dense(16, activation='relu', kernel_initializer='normal', kernel_regularizer=regularizers.l1(x)))
model.add(Dense(1, kernel_initializer='normal'))

model.compile(optimizer=optimizers.adam(lr=l), loss='mean_squared_error')

reduce_lr = ReduceLROnPlateau(monitor='val_loss', mode='min', factor=0.5, patience=3, min_lr=0.000001, verbose=1, cooldown=0)

history = model.fit(xtrain, ytrain, epochs=epochs, batch_size=batch_size, validation_split=0.3, callbacks=[reduce_lr])

I doubt that the way train_y was preprocessed, it could not properly sync with your train_x. My question would be, did you compress your y_train following some frequency based technique ?
I think if you have compressed your labels (for each row) by some frequency based technique, you have already introduced high bias on your data. Do let me know how the compression was done ! Thanks


I would suggest the following:

  1. I see that number of data points is less. More the complexity of the problem, more data points is required for a deep learning model to learn. Look for a similar dataset with large amount of data. Train your network on that dataset and transfer it to your problem.

  2. Is there a way to augment data?? I see your signal length as 18000. You can down sample the data by half using different techniques and augment the dataset. You will be working with signal of length 9000.

  3. Try reducing convolution kernel length to 3 or 5 and increase model depth by adding another conv layer.

  4. I would strongly suggest to try random forest an gradient boosted trees and see how they perform.


I have face ECG problem when I did my final assignment in college a year ago, but with different approach and data (MIT-BIH).

It seems you use single lead, isn't it? Have you try to prepare the data before like clean it up (mind the heartbeat noise)? My suggestion is, try to not combine all data into one single list for training, that can occur over fit due the nature of human heartbeat, try to make training based from gender or age. In some literature, it pretty helps.

Model are not working properly, not because a wrong implementation, but sometimes how we prepare the data well.


Your model is clearly overfitting the dataset. One suggestion that nobody has taken into account among the commenters is to increase the stride. Here you have kernel size = 700, no padding and stride = 1. So you will obtain an output with shape (None, 17301, 100) from the first Conv layer.

I would try to either increase the stride to a number in the order of 50 to 100 (shifting your kernel of a fraction of 2.33/(700/stride) seconds) or insert a Pooling layer after each of the Conv layers.