0
votes

I apologize that I'm not good at English.

I'm trying to build my own Fully Convolutional Network using TensorFlow. But I have difficulties on training this model with my own image data, whereas the MNIST data worked properly.

Here is my FCN model code: (Not using pre-trained or pre-bulit model)

import tensorflow as tf
import numpy as np

Loading MNIST Data

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

images_flatten = tf.placeholder(tf.float32, shape=[None, 784])

images = tf.reshape(images_flatten, [-1,28,28,1]) # CNN deals with 3 dimensions
labels = tf.placeholder(tf.float32, shape=[None, 10])
keep_prob = tf.placeholder(tf.float32) # Dropout Ratio

Convolutional Layers

# Conv. Layer #1
W1 = tf.Variable(tf.truncated_normal([3, 3, 1, 4], stddev = 0.1))
b1 = tf.Variable(tf.truncated_normal([4], stddev = 0.1))    
FMA = tf.nn.conv2d(images, W1, strides=[1,1,1,1], padding='SAME')
# FMA stands for Fused Multiply Add, which means convolution
RELU = tf.nn.relu(tf.add(FMA, b1))
POOL = tf.nn.max_pool(RELU, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

# Conv. Layer #2
W2 = tf.Variable(tf.truncated_normal([3, 3, 4, 8], stddev = 0.1))
b2 = tf.Variable(tf.truncated_normal([8], stddev = 0.1))    
FMA = tf.nn.conv2d(POOL, W2, strides=[1,1,1,1], padding='SAME')
RELU = tf.nn.relu(tf.add(FMA, b2))
POOL = tf.nn.max_pool(RELU, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

# Conv. Layer #3
W3 = tf.Variable(tf.truncated_normal([7, 7, 8, 16], stddev = 0.1))
b3 = tf.Variable(tf.truncated_normal([16], stddev = 0.1))   
FMA = tf.nn.conv2d(POOL, W3, strides=[1,1,1,1], padding='VALID')
RELU = tf.nn.relu(tf.add(FMA, b3))

# Dropout
Dropout = tf.nn.dropout(RELU, keep_prob)

# Conv. Layer #4
W4 = tf.Variable(tf.truncated_normal([1, 1, 16, 10], stddev = 0.1))
b4 = tf.Variable(tf.truncated_normal([10], stddev = 0.1))   
FMA = tf.nn.conv2d(Dropout, W4, strides=[1,1,1,1], padding='SAME')
LAST_RELU = tf.nn.relu(tf.add(FMA, b4))

Summary: [Conv-ReLU-Pool] - [Conv-ReLU-Pool] - [Conv-ReLU] - [Dropout] - [Conv-ReLU]

Define Loss, Accuracy

prediction = tf.squeeze(LAST_RELU) 
# Because FCN returns (1 x 1 x class_num) in training

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction, labels))
# First arg is 'logits=' and the other one is 'labels='

optimizer = tf.train.AdamOptimizer(0.001)    
train = optimizer.minimize(loss)

label_max = tf.argmax(labels, 1)
pred_max = tf.argmax(prediction, 1)
correct_pred = tf.equal(pred_max, label_max)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

Training Model

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in range(10000):
   image_batch, label_batch = mnist.train.next_batch(100)
   sess.run(train, feed_dict={images: image_batch, labels: label_batch, keep_prob: 0.8})
   if i % 10 == 0:
       tr = sess.run([loss, accuracy], feed_dict={images: image_batch, labels: label_batch, keep_prob: 1.0})
       print("Step %d, Loss %g, Accuracy %g" % (i, tr[0], tr[1]))

Loss: 0.784 (Approximately)

Accuracy: 94.8% (Approximately)

The problem is that, training this model with MNIST data worked very well, but with my own data, loss is always same(0.6319), and the output layer is always 0.

There is no difference with the code, excepting for the third convolutional layer's filter size. This filter size and input size which is compressed by previous pooling layers, must have same width & height. That's why the filter size in this layer is [7,7].

What is wrong with my model?..

The only different code between two cases (MNIST, my own data) is:

Placeholder

My own data has (128 x 64 x 1) and the label is 'eyes', 'not_eyes'

images = tf.placeholder(tf.float32, [None, 128, 64, 1])
labels = tf.placeholder(tf.int32, [None, 2])

3rd Convolutional Layer

W3 = tf.Variable(tf.truncated_normal([32, 16, 8, 16], stddev = 0.1))

Feeding (Batch)

image_data, label_data = input_data.get_batch(TRAINING_FILE, 10)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

for i in range(10000):
    image_batch, label_batch = sess.run([image_data, label_data])
    sess.run(train, feed_dict={images: image_batch, labels: label_batch, keep_prob: 0.8})
    if i % 10 == 0: ... # Validation part is almost same, too...

coord.request_stop()
coord.join(threads)

Here "input_data" is an another python file in the same directory, and "get_batch(TRAINING_FILE, 10)" is the function that returns batch data. The code is:

def get_input_queue(txtfile_name):
    images = []
    labels = [] 

    for line in open(txtfile_name, 'r'): # Here txt file has data's path, label, label number
        cols = re.split(',|\n', line)
        labels.append(int(cols[2]))
        images.append(tf.image.decode_jpeg(tf.read_file(cols[0]), channels = 1)) 

    input_queue = tf.train.slice_input_producer([images, labels], shuffle = True)
    return input_queue

def get_batch(txtfile_name, batch_size):
    input_queue = get_input_queue(txtfile_name)
    image = input_queue[0]
    label = input_queue[1]

    image = tf.reshape(image, [128, 64, 1])

    batch_image, batch_label = tf.train.batch([image, label], batch_size)
    batch_label_one_hot = tf.one_hot(tf.to_int64(batch_label), 2, on_value=1.0, off_value=0.0)
    return batch_image, batch_label_one_hot

It seems not to have any problem .... :( Please Help me..!!

2
Try scaling your inputs (-1 to 1) and set the stddev of your weights to a lower value like 0.01. The output always zero points to dead ReLUs.vijay m
What does 'scaling inputs (-1 to 1)' mean?James

2 Answers

1
votes

Are your inputs scaled appropriately?. The jpegs are in [0-255] range and it needs to be scaled to [-1 - 1]. You can try:

 image = tf.reshape(image, [128, 64, 1])
 image = tf.scalar_mul((1.0/255), image)
 image = tf.subtract(image, 0.5)
 image = tf.multiply(image, 2.0)
0
votes

What is the accuracy you are getting with your model for MNIST? It would be helpful if you post the code. Are you using the trained model to evaluate the output for your own data.

A general suggestion on setting up the convolution model is provided here. Here is the model suggestion according to the article :-

INPUT -> [[CONV -> RELU]*N -> POOL?]*M -> [FC -> RELU]*K -> FC

Having more than one layers of CONV->RELU pair before pooling improves learning complex features. Try with N=2 instead of 1.

Some other suggestions:

  1. While you are preparing your data reduce it to smaller size than 128x64. Try same size as the MNIST data ..

    image = tf.reshape(image, [28, 28, 1])

  2. If your eye/noeye image is color, then convert it to greyscale and normalize the values to unity range. You can do this using numpy or tf, here is how using numpy

grayscale-->

  img = np.dot(np.array(img, dtype='float32'), [[0.2989],[0.5870],[0.1140]])

normalize-->

 mean = np.mean(img, dtype='float32')
 std = np.std(img, dtype='float32', ddof=1)
 if std < 1e-4: std = 1.
 img = (img - mean) / std