0
votes

I have some sample of product tags which includes only numbers. However I managed to process those images so that I could use those images to recognize the digits. I used English trained data file but the results were really bad. Is there a way I can train a data set using template images.

I have referred the documentation of training tesseract but I couldn't train using the images.

But after having the box file how can I make the eng.traineddata.

Can someone please help me.

This is the cropped original image of the product tag http://imgur.com/hNNlX9g

This is the processed image of the product tag http://imgur.com/Kzxtu0M

2

2 Answers

0
votes

You could try setting a whitelist of characters to be recognised (digits in your case). The parameter is called tessedit_char_whitelist. Honestly results could be mixed.

0
votes

You can use only whitelisting if you have e traineddata set which supports it. If you want a fast result use Tesseract 3.x there should be plenty of trainedata available which support whitelisting (which works awesome).

I by myself used Tesseract 4 whith a traineddata which worked tremendously with the following options: -l digits --psm 10

See this Post for the Link to the Data set: Can not find Tesseract 4.0 tessdata only for Numbers