Basic softmax model implementation on 150x150 images

Question

I'n my learning of tensorflow I've tried to adapt the basic softmax MNIST example to work on my own image set. It's aerial photographs of buildings and I want to classify them by roof type. There are 4 such classifications that can be made.

The simple (maybe naive) idea was to resize the images (since they're not all the same) and flatten them. Then change the tensor shapes in the code and run it. Of course it doesn't work though. First let me show you the code.

# Load csv Data
filenames = []
_answers = []
with open('/home/david/DSG/id_train.csv') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=',')
    for row in csv_reader:
        one_hot_vec = [0, 0, 0, 0]
        one_hot_vec[int(row[1])-1] = 1
        _answers.append(np.asarray(one_hot_vec))
        filenames.append("/home/david/DSG/roof_images/" + str(row[0]) + ".jpg")


sess = tf.InteractiveSession()

# Image Loading and processing
filename_q = tf.train.string_input_producer(filenames)
reader = tf.WholeFileReader()
key, value = reader.read(filename_q)
__img = tf.image.decode_jpeg(value, channels=1)
_img = tf.expand_dims(tf.image.convert_image_dtype(__img, tf.float32),0)
img = tf.image.resize_nearest_neighbor(_img, [150,150])

# Actual model
x = tf.placeholder(tf.float32, [None, 22500])
W = tf.Variable(tf.zeros([22500, 4]))
b = tf.Variable(tf.zeros([4]))
y = tf.nn.softmax(tf.matmul(x, W) + b)

# Training algorithm
y_ = tf.placeholder(tf.float32, [None, 4])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y,1e-10,1.0)), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Evaluate model, this checks the results from the y (prediciton matrix) against the known answers (y_)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

coord = tf.train.Coordinator()
init_op = tf.initialize_all_variables()
sess.run(init_op)

threads = tf.train.start_queue_runners(sess=sess, coord=coord)

# Loads and process all the images, adding them to an array for later use
images = []
for i in range(8000):
    if i % 100 == 0:
        print("Processing Images " + str(100*(i+100)/8000) + "% complete")
    image = img.eval().flatten()
    images.append(image)

# Train our model
for i in range(80):
    print("Training the Model " + str(100*(i+1)/80) + "% complete")
    batchImages = images[i*100:((i+1)*100)]
    batchAnswers = answers[i*100:((i+1)*100)].astype(float)
    # Here's a debug line I put in to see what the numbers were
    print(sess.run(y, feed_dict={x: batchImages, y_: batchAnswers}))
    sess.run(train_step, feed_dict={x: batchImages, y_: batchAnswers})

coord.request_stop()
coord.join(threads)

As can be seen I print the y values from softmax as I'm going along. The result is tensors that exclusively look like this [0., 0., 0., 1.]. I thought this was pretty strange. So I printed the value of tf.matmul(x, W) + b.

The result was this:

[[-236.86216736 -272.89904785   59.67744446  450.08377075]
 [-327.19482422 -384.06918335   87.47353363  623.79052734]
 [-230.79460144 -264.78787231   60.29759598  435.28485107]
 [-188.10324097 -212.30155945   53.8230629   346.58175659]
 [-180.26617432 -209.45767212   48.90292358  340.82092285]
 [-177.13232422 -200.59474182   45.97179413  331.75531006]
 [-225.94104004 -258.97390747   61.54353333  423.37136841]
 [-259.33599854 -290.73773193   67.69062042  482.38308716]
 [-151.53468323 -174.09906006   39.97481537  285.65893555]
 [-237.23356628 -272.71789551   65.12500763  444.82647705]
 ..... you get the idea
 [-195.14971924 -221.30851746   53.09790802  363.36032104]
 [-157.30508423 -175.47320557   40.4044342   292.37384033]
 [-178.94332886 -203.36262512   47.0838356   335.22219849]
 [-180.61688232 -200.0609436    45.12242508  335.55541992]
 [-145.7559967  -163.06838989   35.25980377  273.56466675]
 [-194.07254028 -213.78709412   53.14990997  354.70977783]
 [-191.92044067 -219.13395691   49.84062958  361.21377563]]

For the first second and third elements calculating softmax manually you get numbers of the order of E-200, essentially zero. And then a number above 1 for the fourth element. Since the all follow this pattern clearly something is wrong.

Now I've checked the input's, I have my answers as one hot vectors like so [0, 1, 0, 0] and my images are flattened and the values normalized to 0 and 1 (floats). Just like the MNIST example.

I also noticed that in the MNIST example the values from matmul are much smaller. Of the order of E0. Is that because there is 784 elements on each image, as opposed to 22500? Is this the cause of the problem?

Heck maybe this will never work for some reason. I need some help.

EDIT: I decided to check if the image size was having any effect, and sure enough the matmul does give smaller numbers. However they still exhibit a pattern and so I ran it through softmax again and got this output:

[[  2.12474524e-20   1.00000000e+00   1.10456488e-18   0.00000000e+00]
 [  3.22400550e-21   1.00000000e+00   1.24568592e-19   0.00000000e+00]
 [  2.49283055e-28   1.00000000e+00   6.52334536e-26   0.00000000e+00]
 [  4.73190862e-23   1.00000000e+00   3.71980738e-21   0.00000000e+00]
 [  1.11151765e-26   1.00000000e+00   4.14652626e-24   0.00000000e+00]
 [  2.23096276e-22   1.00000000e+00   7.21511359e-21   0.00000000e+00]
 [  1.41888640e-23   1.00000000e+00   2.13637447e-21   0.00000000e+00]
 [  3.55662848e-17   1.00000000e+00   5.14018079e-16   4.06785808e-33]
 [  8.25783417e-26   1.00000000e+00   2.95267040e-23   0.00000000e+00]
 [  1.09395607e-25   1.00000000e+00   3.76775998e-23   0.00000000e+00]
 [  9.34879669e-13   1.00000000e+00   1.07488766e-11   7.21446627e-25]
 [  3.09687017e-34   1.00000000e+00   5.22547065e-31   0.00000000e+00]
 [  2.10362117e-22   1.00000000e+00   1.31067148e-20   0.00000000e+00]
 [  5.86830220e-23   1.00000000e+00   9.55902033e-21   0.00000000e+00]
 [  9.59656235e-17   1.00000000e+00   2.98987045e-15   7.10348533e-32]
 [  2.33712669e-16   1.00000000e+00   3.26934410e-15   1.55066807e-31]
 [  1.09302052e-27   1.00000000e+00   5.34793657e-25   0.00000000e+00]
 [  1.67101349e-25   1.00000000e+00   1.15098012e-22   0.00000000e+00]
 [  4.46111042e-26   1.00000000e+00   1.23599421e-23   0.00000000e+00]
 [  1.31791856e-24   1.00000000e+00   2.25831162e-22   0.00000000e+00]
 [  2.19408324e-12   1.00000000e+00   5.67631081e-11   1.22608556e-23]]

Something else must be wrong then.

Have you tried different learning rates? A learning rate of 0.5 seems very high to me. The huge numbers you get in tf.matmul(x, W) + b could be the result of "overshooting" right from the very first training step due to the learning rate being too high. Maybe try something really low like 0.001 and see where it gets you? — lballes
Thanks for that, it actually seems to be helping. I now get much more reasonable probabilities from softmax. However they still follow a definite pattern and one of the 4 classes always wins. — DavidColson

Olivier Moindrot Olivier Moindrot · Accepted Answer · 2016-06-26T21:25:34

Your dataset might be unbalanced, which will make the network harder to train as it will tend to predict the most probable class.

I think your one-layer model is just not powerful enough to train on a whole dataset. You should maybe add more layers and use convolutions along with max pooling.

But if you want to verify that this model can work, try to train it on a much smaller number of images (ex: 50 images) and see if it can overfit this small training set.

Basic softmax model implementation on 150x150 images

1 Answers