0
votes

I'm having trouble with my CNN.

Input: 700 pictures of a road (each of size 200x66 and 3 channels).

Output: Steering angle of the car on that part of the road (value between -1.5 and 1.5).

Some might know a similar setup from Nvidia. I am trying to reproduce this. Each time I am trying to predict, I get the same value for all pictures of my dataset. When learning for only one epoch the value seems to differ a little in the last few positions after decimal point.

I split the data into 600 pictures for training and 100 for evaluation and used this network:

network = input_data(shape=[None, 66, 200, 3], name='input')
network = local_response_normalization(network)
network = conv_2d(network, 24, 5, 2, activation='elu') # 5x5 mit 2x2 stride
network = conv_2d(network, 36, 5, 2, activation='elu') # 5x5 mit 2x2 stride
network = conv_2d(network, 48, 5, 2, activation='elu') # 5x5 mit 2x2 stride
network = conv_2d(network, 64, 3, 1, activation='elu') # 3x3 mit 1x1 stride
network = conv_2d(network, 64, 3, 1, activation='elu') # 3x3 mit 1x1 stride
network = flatten(network)
network = dropout(network, 0.5)
network = fully_connected(network, 1164, activation='elu')
network = fully_connected(network, 100, activation='elu')
network = fully_connected(network, 50, activation='elu')
network = fully_connected(network, 10, activation='elu')
network = fully_connected(network, 1, activation='elu')
network = regression(network, optimizer='adam', learning_rate=0.01, loss='mean_square', name='target')

I tried different batch sizes (1-100), different number of epochs (1-50) and changing the learning rate (0.001-0.02).

Is it just my dataset being too small for this network?

Hope you can help me.

Thank you.

1
what is the distribution of your training set (label wise)? - ahmet hamza emra
Standardized steering angle now. Didn't change anything. Also changed activation function, shuffeling data after each epoch now... Unfortunately all this makes no difference. The CNN is always predicting the same (and the value is not even near the mean) - Big M
I'm unsure of the reason, but I had exactly the same problem occurring and found that switching to stochastic gradient descent as the optimiser fixed the problem. - Will Andrew
I think my activation functions where wrong for this task. But with others + SGD as optimizer the problem still persists. - Big M

1 Answers

0
votes

I think TO refers to the paper "End to End Learning for Self-Driving Cars" in which they talk about "less than 100 hours of driving" (Sampled at 10Hz). That's up to 3.6M samples, before cleaning and augmentation, compared to a few hundred.

Also TO misread the network architecture slightly, there's no 1164-sized FC-layer, that's just the result of the flattening operation on the last conv-layer.

I did have some success replicating the paper with about 50K samples and a much shallower network (3x Conv Blocks, 1x Hidden FC). Number of parameters was about the same though, ~250K.