Validation loss not changing in Resnet

Question

So I have data like in the shape of (25000, 178, 178, 3) where I have 25000 samples and each have 3 different color channel(not the RGB one), where I have around 21k samples with label 0 and rest 4k as label 1. Here's one of my sample data:

array([[[[1.79844797e-01, 1.73587397e-01, 1.73587397e-01, ...,
          4.84393053e-02, 5.15680127e-02, 5.46967126e-02],
         [1.76716089e-01, 1.79844797e-01, 1.82973504e-01, ...,
          5.15680127e-02, 5.31323589e-02, 5.15680127e-02],
         [1.81409150e-01, 1.86102197e-01, 1.81409150e-01, ...,
          5.15680127e-02, 5.31323589e-02, 5.15680127e-02]]],


       [[[2.51065755e+00, 2.53197193e+00, 2.53197193e+00, ...,
          1.88543844e+00, 1.89964795e+00, 1.90675282e+00],
         [2.51776242e+00, 2.52486706e+00, 2.53197193e+00, ...,
          1.89964795e+00, 1.90675282e+00, 1.90675282e+00],
         [2.53197193e+00, 2.51776242e+00, 2.52486706e+00, ...,
          1.91385746e+00, 1.90675282e+00, 1.90675282e+00]]],


       [[[7.13270283e+00, 7.11016369e+00, 7.13270283e+00, ...,
          4.85625362e+00, 4.90133190e+00, 4.94641018e+00],
         [7.08762503e+00, 7.08762503e+00, 7.08762503e+00, ...,
          4.92387104e+00, 4.96894932e+00, 4.96894932e+00],
         [7.08762503e+00, 7.08762503e+00, 7.06508589e+00, ...,
          4.99148846e+00, 4.96894932e+00, 4.96894932e+00]]],
      dtype=float32)

Now firstly I'm trying to normalize by color channel. As each color channel is completely different so I'm normalizing by color channel as follows, dara_array is my whole dataset:

def nan(index):
    data_array[:, :, :, index] = (data_array[:, :, :, index] - np.min(data_array[:, :, :, index]))/(np.max(data_array[:, :, :, index]) - np.min(data_array[:, :, : ,index]))

Splitting for training, validation and testing:

rand_indices = np.random.permutation(len(data))
train_indices = rand_indices[0:19000]
valid_indices = rand_indices[19000:21000]
test_indices = rand_indices[21000:len(data)]

x_val = data_array[valid_indices, :]
y_val = EDR[[valid_indices]].astype('float')

x_train = data_array[train_indices, :]
y_train = EDR[[train_indices]].astype('float')

x_test = data_array[test_indices, :]
y_test = EDR[[test_indices]].astype('float')

Then I'm using Imagedatagenerator to fit the training data like this:

gen = ImageDataGenerator(
            rotation_range=40,
            zoom_range=0.2,
            shear_range=0.2,
            width_shift_range=0.2,
            height_shift_range=0.2,
            fill_mode='nearest',
            horizontal_flip=True,
    )
gen.fit(x_train)

Then I'm using RESNET to train the data as follows:

img_height,img_width = 178, 178 
num_classes = 2

base_model = applications.resnet.ResNet101(weights= None, include_top=False, input_shape= (img_height,img_width,3))

x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.7)(x)
predictions = Dense(1, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = predictions)

initial_learning_rate = 0.001
def lr_step_decay(epoch, lr):
    drop_rate = 0.5
    epochs_drop = 10.0
    return initial_learning_rate * math.pow(drop_rate, math.floor(epoch/epochs_drop))

sgd = tf.keras.optimizers.SGD(lr = 0.001, momentum = 0.9, decay = 1e-6, nesterov=False)
opt_rms = optimizers.RMSprop(lr=0.001,decay=1e-6)

model.compile(loss = 'binary_crossentropy', optimizer = sgd, metrics = ['accuracy'])
history = model.fit_generator(gen.flow(x_train, y_train, batch_size = 64), 64, epochs = 30, verbose=1, validation_data=(x_val, y_val),
                   callbacks=[LearningRateScheduler(lr_step_decay)])

And here's how my model is training:

Epoch 1/30
64/64 [==============================] - 46s 713ms/step - loss: 0.5535 - accuracy: 0.8364 - val_loss: 6.0887 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 2/30
64/64 [==============================] - 43s 671ms/step - loss: 0.4661 - accuracy: 0.8562 - val_loss: 0.6467 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 3/30
64/64 [==============================] - 43s 673ms/step - loss: 0.4430 - accuracy: 0.8640 - val_loss: 0.4231 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 4/30
64/64 [==============================] - 45s 699ms/step - loss: 0.4327 - accuracy: 0.8674 - val_loss: 0.3895 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 5/30
64/64 [==============================] - 43s 677ms/step - loss: 0.4482 - accuracy: 0.8559 - val_loss: 0.3607 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 6/30
64/64 [==============================] - 43s 678ms/step - loss: 0.3857 - accuracy: 0.8677 - val_loss: 0.4244 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 7/30
64/64 [==============================] - 43s 677ms/step - loss: 0.4308 - accuracy: 0.8623 - val_loss: 0.4049 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 8/30
64/64 [==============================] - 43s 677ms/step - loss: 0.3776 - accuracy: 0.8711 - val_loss: 0.3580 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 9/30
64/64 [==============================] - 43s 677ms/step - loss: 0.4005 - accuracy: 0.8672 - val_loss: 0.3689 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 10/30
64/64 [==============================] - 43s 676ms/step - loss: 0.3977 - accuracy: 0.8828 - val_loss: 0.3513 - val_accuracy: 0.8760 - lr: 0.0010
Epoch 11/30
64/64 [==============================] - 43s 675ms/step - loss: 0.4394 - accuracy: 0.8682 - val_loss: 0.3491 - val_accuracy: 0.8760 - lr: 5.0000e-04
Epoch 12/30
64/64 [==============================] - 43s 676ms/step - loss: 0.3702 - accuracy: 0.8779 - val_loss: 0.3676 - val_accuracy: 0.8760 - lr: 5.0000e-04
Epoch 13/30
64/64 [==============================] - 43s 678ms/step - loss: 0.3904 - accuracy: 0.8706 - val_loss: 0.3621 - val_accuracy: 0.8760 - lr: 5.0000e-04
Epoch 14/30
64/64 [==============================] - 43s 677ms/step - loss: 0.3579 - accuracy: 0.8765 - val_loss: 0.3483 - val_accuracy: 0.8760 - lr: 5.0000e-04

My validation accuracy is not changing at all, it's remaining constant. And probably it's predicting everything as 0 cause that'll be the exact accuracy of validation data if it predicts everything as 0 as per split(248 1's out of total 2k val record). Can someone tell me what I'm doing wrong here?

Sample plot of one file with 5 time dim(I'm just using 1 for training) and 1 channel from data:

As you said, your data is imbalanced, that might be the reason why your model is predicting everything as 0. I don't know how to avoid this, but I think using weights = 'imagenet' would help. — Adarsh Wase
@AdarshWase please check the sample image which I have attached, I have satellite Images so I thing adding weights of imagenet won't help! Please correct me if I'm wrong — Chris_007

Timbus Calin Timbus Calin · Accepted Answer · 2021-06-29T09:07:42

Your observation is indeed correct : the network is not learning anything.

Ensure that your dataset is properly labelled + you feed your data correctly. At the same time, ask&answer the following question: is 178x178 a sufficient resolution for the "other" class that I am trying to detect? If you have already undergone those processes, proceed to the following suggestions.

I would try to start to decrease the learning rate to 0.0001 or 0.00001(although at this point the learning could converge too slowly).

At the same time could you remove the Dropout() altogether to see if your network at least is able to learn anything. At least at this point of investigation Dropout() is not needed, it actually hampers the learning due to the high dropout value used.

Validation loss not changing in Resnet

1 Answers