1
votes

I've build a NN model for a binary classification problem with the help of keras, here's the code:

# create a new model
nn_model = models.Sequential()

# add input and dense layer
nn_model.add(layers.Dense(128, activation='relu', input_shape=(22,))) # 128 is the number of the hidden units and 22 is the number of features
nn_model.add(layers.Dense(16, activation='relu'))
nn_model.add(layers.Dense(16, activation='relu'))

# add a final layer
nn_model.add(layers.Dense(1, activation='sigmoid'))

# I have 3000 rows split from the training set to monitor the accuracy and loss
# compile and train the model
nn_model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])

history = nn_model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512, # The batch size defines the number of samples that will be propagated through the network.
                    validation_data=(x_val, y_val))

Here's the training log:

Train on 42663 samples, validate on 3000 samples
Epoch 1/20
42663/42663 [==============================] - 0s 9us/step - loss: 0.2626 - acc: 0.8960 - val_loss: 0.2913 - val_acc: 0.8767
Epoch 2/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2569 - acc: 0.8976 - val_loss: 0.2625 - val_acc: 0.9007
Epoch 3/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2560 - acc: 0.8958 - val_loss: 0.2546 - val_acc: 0.8900
Epoch 4/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2538 - acc: 0.8970 - val_loss: 0.2451 - val_acc: 0.9043
Epoch 5/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2526 - acc: 0.8987 - val_loss: 0.2441 - val_acc: 0.9023
Epoch 6/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2507 - acc: 0.8997 - val_loss: 0.2825 - val_acc: 0.8820
Epoch 7/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2504 - acc: 0.8993 - val_loss: 0.2837 - val_acc: 0.8847
Epoch 8/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2507 - acc: 0.8988 - val_loss: 0.2631 - val_acc: 0.8873
Epoch 9/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2471 - acc: 0.9012 - val_loss: 0.2788 - val_acc: 0.8823
Epoch 10/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2489 - acc: 0.8997 - val_loss: 0.2414 - val_acc: 0.9010
Epoch 11/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2471 - acc: 0.9017 - val_loss: 0.2741 - val_acc: 0.8880
Epoch 12/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2458 - acc: 0.9016 - val_loss: 0.2523 - val_acc: 0.8973
Epoch 13/20
42663/42663 [==============================] - 0s 4us/step - loss: 0.2433 - acc: 0.9022 - val_loss: 0.2571 - val_acc: 0.8940
Epoch 14/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2457 - acc: 0.9012 - val_loss: 0.2567 - val_acc: 0.8973
Epoch 15/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2421 - acc: 0.9020 - val_loss: 0.2411 - val_acc: 0.8957
Epoch 16/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2434 - acc: 0.9007 - val_loss: 0.2431 - val_acc: 0.9000
Epoch 17/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2431 - acc: 0.9021 - val_loss: 0.2398 - val_acc: 0.9000
Epoch 18/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2435 - acc: 0.9018 - val_loss: 0.2919 - val_acc: 0.8787
Epoch 19/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2409 - acc: 0.9029 - val_loss: 0.2478 - val_acc: 0.8943
Epoch 20/20
42663/42663 [==============================] - 0s 5us/step - loss: 0.2426 - acc: 0.9020 - val_loss: 0.2380 - val_acc: 0.9007

I plotted the accuracy and loss for both training and validation set:

enter image description here enter image description here As we can see, the result is not very stable, and I selected two epoches to retrain all of the training set, here's the new log:

Epoch 1/2
45663/45663 [==============================] - 0s 7us/step - loss: 0.5759 - accuracy: 0.7004
Epoch 2/2
45663/45663 [==============================] - 0s 5us/step - loss: 0.5155 - accuracy: 0.7341

My question is why the accuracy is so unstable, and it's only 73% for the retrained model,how can I improve the model? Thanks.

3

3 Answers

3
votes

Your validation size is 3000 and your train size is 42663 which means your validation size is around 7%. Your validation accuracy is jumping between .88 to .90 which is -+2% jump. 7% validation data is too small to get good statistics and with just 7% data, -+2% jump is not bad. Normally the validation data should be 20% to 25% of total data i.e 75-25 split of train-val.

Also make sure you shuffle the data before making train-val split.

if X and y is your full datasets then use

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

which shuffles the data and give you 75-25 split.

1
votes

I don't think it's unstable at all for the validation loss to oscillate between 88% and 90%. If you put it on the 0-100 scale, this "unstability" looks absolutely tiny.

import numpy as np
import matplotlib.pyplot as plt

plt.plot(np.arange(20), np.random.randint(88, 90, 20))
plt.title('Random Values Between 88 and 90')
plt.ylim(0, 100)
plt.show()

enter image description here

1
votes

Its hard to tell without knowing the dataset. Currently you only use Dense layers, depending on your problem, Rnns or convolutional layers might suit better for the case. Also what I can see is, you use a pretty high batch size of 512. There are alot of opinions about how the batch size should be. I can say from my experience, that a batch size of more than 128 might cause bad convegence, but this is depended on many things.

Also you might add some normalization to your net by using Dropout layers.

And another point, you might want to pass shuffle=True to model.fit(), else the model will always see the same data in the same order, which can lower its ability to generalize.

Implementing these changes might fix the "bouncing loss", where I think shuffling is the most important one.