0
votes

I am trying to design a convolution neural network for detecting a small red football ball. I have captured aproxx 4000 pictures of a scene in different configurations (adding chairs, bottles,etc…) without the ball inside and 4000 pictures of the scene in also different configurations but with the ball inside somewhere. I am using the resolution 32x32 px. The ball can be seen visually in picture where present. These are some positive example pictures (here are upside down):

I have tried numerous combination of designing the Convolutional NN but I cannot find a decent one. I will present 2 architectures I have tried (a “normal” size one and very small one). I kept designing small and small networks because it thought I would help me with over-fitting problem. So, I have tried: Normal Network Design

Input: 32x32x3
First Conv Layer:

W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 3, 32], stddev=0.1), name=“w1”)
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]), name=“b1”) _
h_conv1 = tf.nn.relu(tf.nn.conv2d(x, W_conv1, strides=[1, 1, 1, 1], padding=‘SAME’)+ b_conv1, name=“conv1”)
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=‘SAME’, name=“pool1”)

2nd Conv Layer:

W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 16], stddev=0.1), name=“w2”)
b_conv2 = tf.Variable(tf.constant(0.1, shape=[16]), name=“b2”)
h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, strides=[1, 1, 1, 1], padding=‘SAME’)+ b_conv2, name=“conv2”)
h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding=‘SAME’, name=“pool2”)

Fully connected layer:

W_fc1 = tf.Variable(tf.truncated_normal([8 * 8* 16, 16], stddev=0.1), name=“w3”)
b_fc1 = tf.Variable(tf.constant(0.1, shape=[16]), name=“b3”)
h_pool2_flat = tf.reshape(h_pool2, [-1, 8816], name=“flat3”)
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1, name=“conv3”)

Dropout

keep_prob = tf.placeholder(tf.float32, name=“keep3”)
h_fc2_drop = tf.nn.dropout(h_fc1, keep_prob, name=“drop3”)

Readout Layer

W_fc3 = tf.Variable(tf.truncated_normal([16, 2], stddev=0.1), name=“w4”)
b_fc3 = tf.Variable(tf.constant(0.1, shape=([2]), name=“b4”) )
y_conv = tf.matmul(h_fc2_drop, W_fc3, name=“yconv”) + b_fc3

Other info

cross_entropy = tf.reduce_mean(
_ tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_conv)+ 0.005 * tf.nn.l2_loss(W_conv1)+ 0.005 * tf.nn.l2_loss(W_fc1) + 0.005 * tf.nn.l2_loss(W_fc3)) _

train_step = tf.train.AdamOptimizer(1e-5,name=“trainingstep”).minimize(cross_entropy)

_#Percentage of correct _
prediction = tf.nn.softmax(y_conv, name=“y_prediction”) _
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y,1), name=“correct_pred”)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name=“acc”)

Parameters

keep_prob: 0.4
batch_size=500
training time in generations=55

Results

Training set final accuracy= 90.2%
Validation set final accuracy= 52.2%

Graph link : Link to accuracy graph

Small Network Design

Input: 32x32x3

First Conv Layer:

W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 3, 16], stddev=0.1), name=“w1”)
_b_conv1 = tf.Variable(tf.constant(0.1, shape=[16]), name=“b1”) _
h_conv1 = tf.nn.relu(tf.nn.conv2d(x, W_conv1, strides=[1, 1, 1, 1], padding=‘SAME’)+ b_conv1, name=“conv1”)
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding=‘SAME’, name=“pool1”)

Fully connected layer:

W_fc1 = tf.Variable(tf.truncated_normal([16 * 16* 16, 8], stddev=0.1), name=“w3”)
b_fc1 = tf.Variable(tf.constant(0.1, shape=[8]), name=“b3”)
h_pool2_flat = tf.reshape(h_pool1, [-1, 161616], name=“flat3”)
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1, name=“conv3”)

Dropout

keep_prob = tf.placeholder(tf.float32, name=“keep3”)
h_fc2_drop = tf.nn.dropout(h_fc1, keep_prob, name=“drop3”)

Readout Layer

W_fc3 = tf.Variable(tf.truncated_normal([8, 2], stddev=0.1), name=“w4”)
b_fc3 = tf.Variable(tf.constant(0.1, shape=([2]), name=“b4”) )
y_conv = tf.matmul(h_fc2_drop, W_fc3, name=“yconv”) + b_fc3

Other info

cross_entropy = tf.reduce_mean(
_ tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv)+ 0.005 * tf.nn.l2_loss(W_conv1)+ 0.005 * tf.nn.l2_loss(W_fc1) + 0.005 * tf.nn.l2_loss(W_fc3)) _

train_step = tf.train.AdamOptimizer(1e-5,name=“trainingstep”).minimize(cross_entropy)

_#Percentage of correct _
prediction = tf.nn.softmax(y_conv, name=“y_prediction”) _
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y,1), name=“correct_pred”)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name=“acc”)

Parameters

keep_prob: 0.4
batch_size=500
training time in generations=55

Results

Training set final accuracy= 87%
Validation set final accuracy= 60.6%

Graph Link to accuracy graph

So, everything I do, I cannot get a decent accuracy on validation test. I am sure that is something that is missing but I cannot identify what. I am using dropout and l2 but it seems to overfit anyway

Thanks for reading and amateur or advanced in CNN, please leave a feedback

1
i think you should use a better data set, deep learning requires HUGE datasets - bakaDev
Thanks for the input @bakaDev . It's a small CNN without so many layers and weights, it's 32x32 and it has only two outputs and seems a simple thing to recognise, a red ball in a environment. Do you think that 8000 pics aren't enough ? - Vlad
1-dataset quality is very important 2-if you want to improve your model use could always optimize the hyper parameters:papers.nips.cc/paper/… - bakaDev
how did you split your data to train and valid? Is it random split or there is something qualitatively different about one set and another (like different room, different furniture?) - lejlot

1 Answers

0
votes

Your results and accuracy curve seem quite normal to me, so the model is learning fine. Few suggestions:

  • As already pointed out in the comments, you probably need a bigger data set. Compare your data set to CIFAR-10, which has 50000 training and 10000 test images, also 32x32. It's just possible that your training data doesn't contain that much of a variation to predict your validation/test images. Consider image augmentation techniques to expand your data set artificially.
  • When you have enough data, use most of it for training. For example, out of 10000 images, I'd split it like this: 7000 for training, 1500 for validation and 1500 for testing. This will make less likely to overfit.
  • If you are sure that your training dataset represents target population well, you might want to play with your regularization hyperparameters: I noticed dropout probability and L2 regularizer. In general, by increasing these parameters you fight overfitting and improve generalization. Early layers usually need a smaller dropout value than later ones. Also consider trying batchnorm, another technique that helps generalization.
  • You might also want to tweak your other hyper-parameters as well (learning rate, filter size, number of filters, batch size, etc) to get a better performance. Here's a good discussion how to do it efficiently.
  • Did you stop training after 10 epochs (this is a limit on your charts)? You probably should give it more time, because for CIFAR-10 it sometimes takes 30-50 epochs to learn well.