I'm trying to replicate some of the examples from Neural Networks and Deep Learning with Keras, but I'm having problems training a network based on the architecture from chapter 1. The aim is to classify written digits from the MNIST dataset. The architecture:
- 784 inputs (one for each of the 28 * 28 pixels in MNIST images)
- a hidden layer of 30 neurons
- an output layer of 10 neurons
- Weights and biases are initialized from a Gaussian distribution with mean 0 and standard deviation 1.
- The loss/cost function is mean squared error.
- The optimizer is stochastic gradient descent.
Hyper-parameters:
- learning rate = 3.0
- batch size = 10
- epochs = 30
My code:
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from keras.initializers import RandomNormal
# import data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# input image dimensions
img_rows, img_cols = 28, 28
x_train = x_train.reshape(x_train.shape[0], img_rows * img_cols)
x_test = x_test.reshape(x_test.shape[0], img_rows * img_cols)
input_shape = (img_rows * img_cols,)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# convert class vectors to binary class matrices
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print('y_train shape:', y_train.shape)
# Construct model
# 784 * 30 * 10
# Normal distribution for weights/biases
# Stochastic Gradient Descent optimizer
# Mean squared error loss (cost function)
model = Sequential()
layer1 = Dense(30,
input_shape=input_shape,
kernel_initializer=RandomNormal(stddev=1),
bias_initializer=RandomNormal(stddev=1))
model.add(layer1)
layer2 = Dense(10,
kernel_initializer=RandomNormal(stddev=1),
bias_initializer=RandomNormal(stddev=1))
model.add(layer2)
print('Layer 1 input shape: ', layer1.input_shape)
print('Layer 1 output shape: ', layer1.output_shape)
print('Layer 2 input shape: ', layer2.input_shape)
print('Layer 2 output shape: ', layer2.output_shape)
model.summary()
model.compile(optimizer=SGD(lr=3.0),
loss='mean_squared_error',
metrics=['accuracy'])
# Train
model.fit(x_train,
y_train,
batch_size=10,
epochs=30,
verbose=2)
# Run on test data and output results
result = model.evaluate(x_test,
y_test,
verbose=1)
print('Test loss: ', result[0])
print('Test accuracy: ', result[1])
Output (Using Python 3.6 and the TensorFlow backend):
Using TensorFlow backend.
x_train shape: (60000, 784)
60000 train samples
10000 test samples
y_train shape: (60000, 10)
Layer 1 input shape: (None, 784)
Layer 1 output shape: (None, 30)
Layer 2 input shape: (None, 30)
Layer 2 output shape: (None, 10)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 30) 23550
_________________________________________________________________
dense_2 (Dense) (None, 10) 310
=================================================================
Total params: 23,860
Trainable params: 23,860
Non-trainable params: 0
_________________________________________________________________
Epoch 1/30
- 7s - loss: nan - acc: 0.0987
Epoch 2/30
- 7s - loss: nan - acc: 0.0987
(repeated for all 30 epochs)
Epoch 30/30
- 6s - loss: nan - acc: 0.0987
10000/10000 [==============================] - 0s 22us/step
Test loss: nan
Test accuracy: 0.098
As you can see, the network isn't learning at all, and I'm not sure why. The shapes look all right as far as I can tell. What am I doing that's preventing the network from learning?
(Incidentally, I know that cross-entropy loss and a softmax output layer would be better; however, from the linked book, they don't appear to be necessary. The book's manually implemented network in chapter 1 learns successfully; I'm trying to replicate that before moving on.)