Simple Keras neural network isn't learning

Question

I'm trying to replicate some of the examples from Neural Networks and Deep Learning with Keras, but I'm having problems training a network based on the architecture from chapter 1. The aim is to classify written digits from the MNIST dataset. The architecture:

784 inputs (one for each of the 28 * 28 pixels in MNIST images)
a hidden layer of 30 neurons
an output layer of 10 neurons
Weights and biases are initialized from a Gaussian distribution with mean 0 and standard deviation 1.
The loss/cost function is mean squared error.
The optimizer is stochastic gradient descent.

Hyper-parameters:

learning rate = 3.0
batch size = 10
epochs = 30

My code:

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.initializers import RandomNormal


# import data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# input image dimensions
img_rows, img_cols = 28, 28

x_train = x_train.reshape(x_train.shape[0], img_rows * img_cols)
x_test = x_test.reshape(x_test.shape[0], img_rows * img_cols)
input_shape = (img_rows * img_cols,)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print('y_train shape:', y_train.shape)

# Construct model
# 784 * 30 * 10
# Normal distribution for weights/biases
# Stochastic Gradient Descent optimizer
# Mean squared error loss (cost function)
model = Sequential()
layer1 = Dense(30,
               input_shape=input_shape,
               kernel_initializer=RandomNormal(stddev=1),
               bias_initializer=RandomNormal(stddev=1))
model.add(layer1)
layer2 = Dense(10,
               kernel_initializer=RandomNormal(stddev=1),
               bias_initializer=RandomNormal(stddev=1))
model.add(layer2)
print('Layer 1 input shape: ', layer1.input_shape)
print('Layer 1 output shape: ', layer1.output_shape)
print('Layer 2 input shape: ', layer2.input_shape)
print('Layer 2 output shape: ', layer2.output_shape)

model.summary()
model.compile(optimizer=SGD(lr=3.0),
              loss='mean_squared_error',
              metrics=['accuracy'])

# Train 
model.fit(x_train,
          y_train,
          batch_size=10,
          epochs=30,
          verbose=2)

# Run on test data and output results
result = model.evaluate(x_test,
                        y_test,
                        verbose=1)
print('Test loss: ', result[0])
print('Test accuracy: ', result[1])

Output (Using Python 3.6 and the TensorFlow backend):

Using TensorFlow backend.
x_train shape: (60000, 784)
60000 train samples
10000 test samples
y_train shape: (60000, 10)
Layer 1 input shape:  (None, 784)
Layer 1 output shape:  (None, 30)
Layer 2 input shape:  (None, 30)
Layer 2 output shape:  (None, 10)
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 30)                23550     
_________________________________________________________________
dense_2 (Dense)              (None, 10)                310       
=================================================================
Total params: 23,860
Trainable params: 23,860
Non-trainable params: 0
_________________________________________________________________
Epoch 1/30
 - 7s - loss: nan - acc: 0.0987
Epoch 2/30
 - 7s - loss: nan - acc: 0.0987

(repeated for all 30 epochs)

Epoch 30/30
 - 6s - loss: nan - acc: 0.0987
10000/10000 [==============================] - 0s 22us/step
Test loss:  nan
Test accuracy:  0.098

As you can see, the network isn't learning at all, and I'm not sure why. The shapes look all right as far as I can tell. What am I doing that's preventing the network from learning?

(Incidentally, I know that cross-entropy loss and a softmax output layer would be better; however, from the linked book, they don't appear to be necessary. The book's manually implemented network in chapter 1 learns successfully; I'm trying to replicate that before moving on.)

@serafeim As I said in the last paragraph, I'm trying to replicate the manually implemented network from the linked book, which uses mean squared error (though the book calls it the quadratic cost function). — DylanSp

Gerges Gerges · Accepted Answer · 2018-01-22T16:24:38

You need to specify the activations of each layer. So for each layer. should be something like this:

layer2 = Dense(10,
           activation='sigmoid',
           kernel_initializer=RandomNormal(stddev=1),
           bias_initializer=RandomNormal(stddev=1))

notice I specified the activation parameter here. Also for the last layer, you should use activation="softmax" since you have multiple categories.

Another thing to consider, is that classification (as opposed to regression) would work best with an entropy loss. So you might want to change the loss value in model.compile to loss='categorical_crossentropy'. However, this is not necessary, and you will still get a result using a mean_square_error loss.

If you still get nan value for the loss, you can try to change learning rate for SGD.

I got test accurracy of 0.9425 using the script you show by only changing the activations of the first layer to sigmoid and second layer to softmax.

Simple Keras neural network isn't learning

2 Answers