I have small research project where I try to decode some captcha images. I use convnet implemented in Tensorflow 0.9, based on MNIST example (https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py)
My code is available at github https://github.com/ksopyla/decapcha/blob/master/decaptcha_convnet.py
I have try to do reproduce the idea described:
- "Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks" Goodfellow at al (https://arxiv.org/pdf/1312.6082.pdf)
- "CAPTCHA Recognition with Active Deep Learning" Stark at al (https://vision.in.tum.de/_media/spezial/bib/stark-gcpr15.pdf)
where particular sequence of chars is encoded as one binary vector. In my case the captchas contains max 20 latin chars, each char is encoded as 63 dim binary vector, where 1 bit is set at position, according to:
- digits '0-9' - 1 at position 0- 9
- big letters 'A-Z' - 1 at position 10-35
- small letters 'a-z' - 1 atposition 36-61
- position 62 is reserved for blank char '' (words shorter then 20 chars are filled with '' up to 20)
So finally when I concatenate all 20 chars I get 20*63 dim vector which my network should learn. My main issue is how to define proper loss function for optimizer.
Architecture of my network:
- conv 3x3x32 ->relu -> pooling(k=2) ->dropout
- conv 3x3x64 ->relu -> pooling(k=2) ->dropout
- conv 3x3x64 ->relu -> pooling(k=2) ->dropout
- FC 1024 ->relu -> dropout
- Output 20*63 -
So my main issue is how to define loss for optimizer and how to evaluate the model. I have try something like this
# Construct model
pred = conv_net(x, weights, biases, keep_prob)
# Define loss and optimizer
#split prediction for each char it takes 63 continous postions, we have 20 chars
split_pred = tf.split(1,20,pred)
split_y = tf.split(1,20,y)
#compute partial softmax cost, for each char
costs = list()
for i in range(20):
costs.append(tf.nn.softmax_cross_entropy_with_logits(split_pred[i],split_y[i]))
#reduce cost for each char
rcosts = list()
for i in range(20):
rcosts.append(tf.reduce_mean(costs[i]))
# global reduce
loss = tf.reduce_sum(rcosts)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
# Evaluate model
# pred are in format batch_size,20*63, reshape it in order to have each character prediction
# in row, then take argmax of each row (across columns) then check if it is equal
# original label max indexes
# then sum all good results and compute mean (accuracy)
#batch, rows, cols
p = tf.reshape(pred,[batch_size,20,63])
#max idx acros the rows
#max_idx_p=tf.argmax(p,2).eval()
max_idx_p=tf.argmax(p,2)
l = tf.reshape(y,[batch_size,20,63])
#max idx acros the rows
#max_idx_l=tf.argmax(l,2).eval()
max_idx_l=tf.argmax(l,2)
correct_pred = tf.equal(max_idx_p,max_idx_l)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))enter code here
I try to split each char from output and do softmax and cross_entropy for each char separatelly, then combine all costs. But I have mixed the tensorflow functions with normal python lists, can I do this? Will tensorflow engine understand this? Which tensorflow functions I can use instead of python lists?
The accuracy is computed in similar manner, the output is reshaped to 20x63 and I take argmax from each row than compare with true encoded char.
When I run this loss function is decreasing, but accuracy rise then fall.
This picture shows how it looks https://plon.io/files/57a0a7fb4bb1210001ca0476
I would be grateful for any further comments, mistakes I have made or ideas to implement.
reduce_sum
. It's equivalent to callingtf.pack
on the Python list first to convert it to a TensorFlow tensor. The accuracy graph looks weird, however, note that your cross entropy loss is huge, when it's in the millions, then decreases in cross-entropy won't necessarily improve accuracy. I would add L2 penalty regularizer and try to wait until cross entropy is closer to zero. Also it helps to start with simpler problem (ie, only digits) to get a sense of how long to wait – Yaroslav Bulatovloss = tf.nn.sigmoid_cross_entropy_with_logits(pred,y)
wouldn't be more apropriate. Previous approach usessoftmax_cross_entrophy_with_logits
but the class should be mutally exlusive, so I split each character compute softmax_cross_entropy and sum over all 20 chars in sequence. – ksopylax_mean = Xdata.mean(axis=0) x_std = Xdata.std(axis=0) X = (Xdata-x_mean)/(x_std+0.00001)
– ksopyla