Captcha recognizing with convnet, how to define loss function

Question

I have small research project where I try to decode some captcha images. I use convnet implemented in Tensorflow 0.9, based on MNIST example (https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py)

My code is available at github https://github.com/ksopyla/decapcha/blob/master/decaptcha_convnet.py

I have try to do reproduce the idea described:

"Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks" Goodfellow at al (https://arxiv.org/pdf/1312.6082.pdf)
"CAPTCHA Recognition with Active Deep Learning" Stark at al (https://vision.in.tum.de/_media/spezial/bib/stark-gcpr15.pdf)

where particular sequence of chars is encoded as one binary vector. In my case the captchas contains max 20 latin chars, each char is encoded as 63 dim binary vector, where 1 bit is set at position, according to:

digits '0-9' - 1 at position 0- 9
big letters 'A-Z' - 1 at position 10-35
small letters 'a-z' - 1 atposition 36-61
position 62 is reserved for blank char '' (words shorter then 20 chars are filled with '' up to 20)

So finally when I concatenate all 20 chars I get 20*63 dim vector which my network should learn. My main issue is how to define proper loss function for optimizer.

Architecture of my network:

conv 3x3x32 ->relu -> pooling(k=2) ->dropout
conv 3x3x64 ->relu -> pooling(k=2) ->dropout
conv 3x3x64 ->relu -> pooling(k=2) ->dropout
FC 1024 ->relu -> dropout
Output 20*63 -

So my main issue is how to define loss for optimizer and how to evaluate the model. I have try something like this

# Construct model
pred = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer

#split prediction for each char it takes 63 continous postions, we have 20 chars
split_pred = tf.split(1,20,pred)
split_y = tf.split(1,20,y)


#compute partial softmax cost, for each char
costs = list()
for i in range(20):  
   costs.append(tf.nn.softmax_cross_entropy_with_logits(split_pred[i],split_y[i]))

#reduce cost for each char
rcosts = list()
for i in range(20):
    rcosts.append(tf.reduce_mean(costs[i]))

# global reduce    
loss = tf.reduce_sum(rcosts)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)


# Evaluate model

# pred are in format batch_size,20*63, reshape it in order to have each     character prediction
# in row, then take argmax of each row (across columns) then check if it is     equal 
# original label max indexes
# then sum all good results and compute mean (accuracy)

#batch, rows, cols
p = tf.reshape(pred,[batch_size,20,63])
#max idx acros the rows
#max_idx_p=tf.argmax(p,2).eval()
max_idx_p=tf.argmax(p,2)

l = tf.reshape(y,[batch_size,20,63])
#max idx acros the rows
#max_idx_l=tf.argmax(l,2).eval()
max_idx_l=tf.argmax(l,2)

correct_pred = tf.equal(max_idx_p,max_idx_l)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))enter code         here

I try to split each char from output and do softmax and cross_entropy for each char separatelly, then combine all costs. But I have mixed the tensorflow functions with normal python lists, can I do this? Will tensorflow engine understand this? Which tensorflow functions I can use instead of python lists?

The accuracy is computed in similar manner, the output is reshaped to 20x63 and I take argmax from each row than compare with true encoded char.

When I run this loss function is decreasing, but accuracy rise then fall. This picture shows how it looks https://plon.io/files/57a0a7fb4bb1210001ca0476

I would be grateful for any further comments, mistakes I have made or ideas to implement.

In newer TF versions you can use Python lists as input to reduce_sum. It's equivalent to calling tf.pack on the Python list first to convert it to a TensorFlow tensor. The accuracy graph looks weird, however, note that your cross entropy loss is huge, when it's in the millions, then decreases in cross-entropy won't necessarily improve accuracy. I would add L2 penalty regularizer and try to wait until cross entropy is closer to zero. Also it helps to start with simpler problem (ie, only digits) to get a sense of how long to wait — Yaroslav Bulatov
I wondering if for this problem the loss = tf.nn.sigmoid_cross_entropy_with_logits(pred,y) wouldn't be more apropriate. Previous approach uses softmax_cross_entrophy_with_logits but the class should be mutally exlusive, so I split each character compute softmax_cross_entropy and sum over all 20 chars in sequence. — ksopyla
The real problem was data normalization, my Xdata is matrix [N,D] when I standarize image then the network start to learn patterns x_mean = Xdata.mean(axis=0) x_std = Xdata.std(axis=0) X = (Xdata-x_mean)/(x_std+0.00001) — ksopyla

ksopyla ksopyla · Accepted Answer · 2016-08-23T10:04:48

The real problem was that my network get stuck, the network output was constant for any input.

When I have changed loss function to loss = tf.nn.sigmoid_cross_entropy_with_logits(pred,y) and normalize input, then the net start to learn the patterns.

Standarization (substract mean and divide by std) helps a lot,

Xdata is matrix [N,D]

x_mean = Xdata.mean(axis=0) 
x_std = Xdata.std(axis=0) 
X = (Xdata-x_mean)/(x_std+0.00001)

Data preprocessing is the key, it is worth to read http://cs231n.github.io/neural-networks-2/#data-preprocessing

Captcha recognizing with convnet, how to define loss function

1 Answers