How to train tesseract to identify only numbers

Question

I have some sample of product tags which includes only numbers. However I managed to process those images so that I could use those images to recognize the digits. I used English trained data file but the results were really bad. Is there a way I can train a data set using template images.

I have referred the documentation of training tesseract but I couldn't train using the images.

But after having the box file how can I make the eng.traineddata.

Can someone please help me.

This is the cropped original image of the product tag http://imgur.com/hNNlX9g

This is the processed image of the product tag http://imgur.com/Kzxtu0M

Remon Nashid Remon Nashid · Accepted Answer · 2013-10-29T23:56:20

You could try setting a whitelist of characters to be recognised (digits in your case). The parameter is called tessedit_char_whitelist. Honestly results could be mixed.

How to train tesseract to identify only numbers

2 Answers