I am learning TensorFlow and was implementing a simple neural network as explained in MNIST for Beginners in TensorFlow docs. Here is the link. The accuracy was about 80-90 %, as expected.
Then following the same article was MNIST for Experts using ConvNet. Instead of implementing that I decided to improve the beginner part. I know about Neural Nets and how they learn and the fact that deep networks can perform better than shallow networks. I modified the original program in MNIST for Beginner to implement a Neural network with 2 hidden layers each of 16 neurons.
It looks something like this :
Image of Network
Code for it
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
x = tf.placeholder(tf.float32, [None, 784], 'images')
y = tf.placeholder(tf.float32, [None, 10], 'labels')
# We are going to make 2 hidden layer neurons with 16 neurons each
# All the weights in network
W0 = tf.Variable(dtype=tf.float32, name='InputLayerWeights', initial_value=tf.zeros([784, 16]))
W1 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Weights', initial_value=tf.zeros([16, 16]))
W2 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Weights', initial_value=tf.zeros([16, 10]))
# All the biases for the network
B0 = tf.Variable(dtype=tf.float32, name='HiddenLayer1Biases', initial_value=tf.zeros([16]))
B1 = tf.Variable(dtype=tf.float32, name='HiddenLayer2Biases', initial_value=tf.zeros([16]))
B2 = tf.Variable(dtype=tf.float32, name='OutputLayerBiases', initial_value=tf.zeros([10]))
def build_graph():
"""This functions wires up all the biases and weights of the network
and returns the last layer connections
:return: returns the activation in last layer of network/output layer without softmax
"""
A1 = tf.nn.relu(tf.matmul(x, W0) + B0)
A2 = tf.nn.relu(tf.matmul(A1, W1) + B1)
return tf.matmul(A2, W2) + B2
def print_accuracy(sx, sy, tf_session):
"""This function prints the accuracy of a model at the time of invocation
:return: None
"""
correct_prediction = tf.equal(tf.argmax(y), tf.argmax(tf.nn.softmax(build_graph())))
correct_prediction_float = tf.cast(correct_prediction, dtype=tf.float32)
accuracy = tf.reduce_mean(correct_prediction_float)
print(accuracy.eval(feed_dict={x: sx, y: sy}, session=tf_session))
y_predicted = build_graph()
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_predicted))
model = tf.train.GradientDescentOptimizer(0.03).minimize(cross_entropy)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for _ in range(1000):
batch_x, batch_y = mnist.train.next_batch(50)
if _ % 100 == 0:
print_accuracy(batch_x, batch_y, sess)
sess.run(model, feed_dict={x: batch_x, y: batch_y})
The Output expected was supposed to be better than what could be achieved with when just a single layer (Assumed that W0 has shape of [784,10] and B0 has shape of [10])
def build_graph():
return tf.matmul(x,W0) + B0
Instead, the output says that network was not training at all. The Accuracy was not crossing 20% in any iteration.
Output
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
My Question
What is wrong with the above program that it does not generalize at all? How can I improve it more without using convolutional neural networks?