Object center detection using Convnet is always returning center of image rather than center of object

Question

I have a small dataset of ~150 images. Each image has an object (rectangle box with white and black color) placed on the floor. The object is same in all images but the pattern of the floor is different. The objective is to train network to find the center of the image. Each image is of dimension 256x256x3.

Train_X is of size 150x256x256x3 and Train_y is of size 150x2 (150 here indicates the total number of images)

I understand 150 images is too small a dataset, but I am ok giving up on some accuracy so I trained data on Conv nets. Here is the architecture of convnet I used

Conv2D layer (filter size of 32)
Activation Relu
Conv2D layer (filter size of 64)
Activation Relu
Flattern layer
Dense(64) layer
Activation Relu
Dense(2)
Activation Softmax
model.compile(loss='mse', optimizer='sgd')

Observation: Trained model always return the normalized center of image 0.5,0.5 as the center of 'object' even on the training data. I was hoping to get center of a rectangular object rather than the center of the image when I run predict function on train_X. Am I getting this output because of my conv layer selections?

Try switching activation to sigmoid. When you use softmax you add a spurious condition to your output - mainly - coordinates summing up to 1. — Marcin Możejko
I tried softmax as well but the result is same. I am not sure why all the predicted values of train and test set are giving normalized center of image as center of the object. — visionStudent
I mean to say I tried using sigmoid as well. Still getting normalized center as predicted output. Tried MSE, ASE as loss functions as well, and still getting same problem — visionStudent

Ambareesh Ambareesh · Accepted Answer · 2018-10-21T18:50:34

Since you haven't mentioned it in the details, the following suggestions (if you haven't implemented them already), could help:

1) Normalizing the input data (say for e.g, if you are working on input images, x_train = x_train/255 before feeding the input to the layer)

2) Try linear activation for the last output layer

3) Running the fitting over higher epochs, and experimenting with different batch sizes

Object center detection using Convnet is always returning center of image rather than center of object

3 Answers