I was training a model for my project on Optical Communication on Colab and this weird thing happened. The model that I trained first, showed near to 99% training and 97% validation accuracy, but the runtime expired sometime in the night. Now, for the same model, I tried re-training after reconnecting to the runtime. But now, the accuracy remains constant from the first epoch at 25%. Surprisingly, there are 4 categories and my model is classifying them all with 0.25. I am not sure what's causing this error because after a few restarts, the model showed similar to original performance but now it's back to the 25% accuracy. Please refer the image and below for the model specs.
model_fm = tf.keras.Sequential([
tf.keras.layers.Conv1D(256,kernel_size = 3, activation = 'relu', input_shape = x_train.shape[1:]),
tf.keras.layers.Dropout(0.3),
tf.keras.layers.Conv1D(128,kernel_size = 3, activation = 'relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Conv1D(64,kernel_size = 3, activation = 'relu'),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(256, activation = 'relu'),
tf.keras.layers.Dense(128, activation = 'relu'),
tf.keras.layers.Dense(128, activation = 'relu'),
tf.keras.layers.Dense(64, activation = 'relu'),
tf.keras.layers.Dense(4, activation = 'softmax')
])
model_fm.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy'])
model_fm.fit(x_train, y_train, batch_size=256, verbose=1, epochs=60,validation_data=(x_val, y_val), callbacks = [earlystopping, reduce_lr])
Earlier progress
Epoch 1/60 612/612 [==============================] - 170s 275ms/step - loss: 0.9359 - accuracy: 0.5621 - val_loss: 0.7793 - val_accuracy: 0.6299
Epoch 2/60 612/612 [==============================] - 168s 274ms/step - loss: 0.5998 - accuracy: 0.7369 - val_loss: 0.4597 - val_accuracy: 0.8002
Epoch 3/60 612/612 [==============================] - 173s 284ms/step - loss: 0.4464 - accuracy: 0.8078 - val_loss: 0.3138 - val_accuracy: 0.8693
Epoch 4/60 612/612 [==============================] - 174s 284ms/step - loss: 0.3427 - accuracy: 0.8578 - val_loss: 0.2393 - val_accuracy: 0.9037
After restarting runtime:
Epoch 1/60 409/409 [==============================] - 112s 273ms/step - loss: 1.3865 - accuracy: 0.2493 - val_loss: 1.3862 - val_accuracy: 0.2594
Epoch 2/60 409/409 [==============================] - 111s 271ms/step - loss: 1.3863 - accuracy: 0.2501 - val_loss: 1.3864 - val_accuracy: 0.2435
P.S. Ignore the change in number of samples used for training for the latter case. The model showed similar results for the entire dataset (25% accuracy). I thought maybe using a smaller number of samples might ease the situation, but it didn't. Your help is very much appreciated.