1
votes

I am training an LSTM model on the SemEval 2017 task 4A dataset (classification problem with 3 classes). I observe that first validation loss decreases but then suddenly increases by a significant amount and again decreases. It is showing a sinusoidal nature which can be observed from the below training epochs.

The training set

Here is the code of my model

model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(Activation('tanh'))
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(32)))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(3, activation='sigmoid'))
model.summary()

And here is the model summary

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_2 (Embedding)      (None, 300, 30)           60000     
_________________________________________________________________
batch_normalization_3 (Batch (None, 300, 30)           120       
_________________________________________________________________
activation_3 (Activation)    (None, 300, 30)           0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 300, 30)           0         
_________________________________________________________________
bidirectional_2 (Bidirection (None, 64)                16128     
_________________________________________________________________
batch_normalization_4 (Batch (None, 64)                256       
_________________________________________________________________
activation_4 (Activation)    (None, 64)                0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 65        
=================================================================
Total params: 76,569
Trainable params: 76,381
Non-trainable params: 188

I am using GloVe for word embeddings, Adam optimizer, Categorical Crossentropy loss function.

After changing the loss function and Dense layer, here is the training phase.

       Train on 16711 samples, validate on 1857 samples
Epoch 1/5
16711/16711 [==============================] - 55s 3ms/step - loss: 0.5976 - accuracy: 0.7456 - val_loss: 0.9060 - val_accuracy: 0.6182
Epoch 2/5
16711/16711 [==============================] - 54s 3ms/step - loss: 0.5872 - accuracy: 0.7521 - val_loss: 0.8919 - val_accuracy: 0.6144
Epoch 3/5
16711/16711 [==============================] - 54s 3ms/step - loss: 0.5839 - accuracy: 0.7518 - val_loss: 0.9067 - val_accuracy: 0.6187
Epoch 4/5
16711/16711 [==============================] - 54s 3ms/step - loss: 0.5766 - accuracy: 0.7554 - val_loss: 0.9437 - val_accuracy: 0.6268
Epoch 5/5
16711/16711 [==============================] - 54s 3ms/step - loss: 0.5742 - accuracy: 0.7544 - val_loss: 0.9272 - val_accuracy: 0.6166

Testing phase

 accr = model.evaluate(test_sequences_matrix, Y_test) 
    2064/2064 [==============================] - 2s 1ms/step

print('Test set\n  Loss: {:0.3f}\n  Accuracy: {:0.3f}'.format(accr[0],accr[1]))
Test set
  Loss: 0.863
  Accuracy: 0.649

Confusion matrix

Confusion Matrix :
[[517 357 165]
 [379 246 108]
 [161  88  43]]
Accuracy Score : 0.3905038759689923

Classification report

              precision    recall  f1-score   support

           0       0.49      0.50      0.49      1039
           1       0.36      0.34      0.35       733
           2       0.14      0.15      0.14       292

    accuracy                           0.39      2064
   macro avg       0.33      0.33      0.33      2064
weighted avg       0.39      0.39      0.39      2064

enter image description here

Confusion matrix code (I have imported from sklearn.metrics import confusion_matrix, accuracy_score, classification_report)

results = confusion_matrix(doc_test.response, Y_test) 

print('Confusion Matrix :')
print(results) 
print('Accuracy Score :',accuracy_score(doc_test.response, Y_test))
1
It's pretty standard to use ReLU, is there any specific reason you're using tanh?Zabir Al Nazi
Is it a classification problem? Which are the possible values of the labels?Pedrolarben
@Pedrolarben yes it is a classification prboelm. Labels are +ve : 1, -ve : 2 and neutral : 0Amber Bhanarkar
@ZabirAlNazi no specific reason for tanh, but I used ReLU and it didn't give any better result too. Rather made it worse. Max Validation loss 95.5932 and min was -3.2638 and 14/20 epochs validation accuacy came a constant of 0.3317 (33.17%)Amber Bhanarkar

1 Answers

1
votes

When you have more than two classes you cannot use binary crossentropy. Change your loss function to categorical crossentropy and set your output layer to have three neurons (one for each class)

model = Sequential()
model.add(Embedding(max_words, 30, input_length=max_len))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.3))
model.add(Bidirectional(LSTM(32)))
model.add(BatchNormalization())
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Anyway, from your training curves, I can see that the network is overfitting. It may be because of your data or because of your network. Check this post to know more about overfitting in deep learning models.

This is the plot of the learning curve of your model. It shows a classical overfitting behaviour. enter image description here