Text classification CNN overfits training

Question

I am trying to use a CNN architecture to classify text sentences. The architecture of the network is as follows:

text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")

conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
drop21 = Dropout(0.5)(conv2)
pool1 = MaxPooling1D(pool_size=2)(drop21)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(pool1)
drop22 = Dropout(0.5)(conv22)
pool2 = MaxPooling1D(pool_size=2)(drop22)
dense = Dense(16, activation='relu')(pool2)

flat = Flatten()(dense)
dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)

outputs = Dense(y_train.shape[1], activation='softmax')(out)

model = Model(inputs=text_input, outputs=outputs)
# compile
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

I have some callbacks as early_stopping and reduceLR to stop the training and to reduce the learning rate when the validation loss is not improving (reducing).

early_stopping = EarlyStopping(monitor='val_loss', 
                               patience=5)
model_checkpoint = ModelCheckpoint(filepath=checkpoint_filepath,
                                   save_weights_only=False,
                                   monitor='val_loss',
                                   mode="auto",
                                   save_best_only=True)
learning_rate_decay = ReduceLROnPlateau(monitor='val_loss', 
                                        factor=0.1, 
                                        patience=2, 
                                        verbose=1, 
                                        mode='auto',
                                        min_delta=0.0001, 
                                        cooldown=0,
                                        min_lr=0)

Once the model is trained the history of the training goes as follows:

We can observe here that the validation loss is not improving from epoch 5 on and that the training loss is being overfitted with each step.

I will like to know if I'm doing something wrong in the architecture of the CNN? Aren't enough the dropout layers to avoid the overfitting? Which are other ways to reduce overfitting?

Any suggestion?

Thanks in advance.

Edit:

I have tried also with regularization an the result where even worse:

kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)

Edit 2:

I have tried to apply BatchNormalization layers after each convolution and the result is the next one:

norm = BatchNormalization()(conv2)

Edit 3:

After applying the LSTM architecture:

text_input = Input(shape=X_train_vec.shape[1:], name = "Text_input")

conv2 = Conv1D(filters=128, kernel_size=5, activation='relu')(text_input)
drop21 = Dropout(0.5)(conv2)
conv22 = Conv1D(filters=64, kernel_size=5, activation='relu')(drop21)
drop22 = Dropout(0.5)(conv22)

lstm1 = Bidirectional(LSTM(128, return_sequences = True))(drop22)
lstm2 = Bidirectional(LSTM(64, return_sequences = True))(lstm1)

flat = Flatten()(lstm2)
dense = Dense(128, activation='relu')(flat)
out = Dense(32, activation='relu')(dense)

outputs = Dense(y_train.shape[1], activation='softmax')(out)

model = Model(inputs=text_input, outputs=outputs)
# compile
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

The most important question is: "How large is your dataset?" This seems like a very small dataset. If that is the case, your primary reaction should be to gather more data. — Felix Kleine Bösing
Then there must be some other problem. Since the validation loss is barely decreasing it´s not learning anything useful. How many classes do you have? Are your classes balanced? — Felix Kleine Bösing
Hmm. It´s hard to guess without further information, but I still suppose that there might be something wrong with the data. Is this a custom project? What are you trying to predict? — Felix Kleine Bösing

Fulan Fulan · Accepted Answer · 2020-06-15T15:04:46

overfitting can caused by many factors, it happens when your model fits too well to the training set.

To handle it you can do some ways:

Add more data

Use data augmentation

Use architectures that generalize well

Add regularization (mostly dropout, L1/L2 regularization are also possible)

Reduce architecture complexity.

for more clearly you can read in https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-overfitting-2bd5d99abe5d

Text classification CNN overfits training

4 Answers