Tesseract has trouble reading this extremely simple string of numbers

Question

I'm currently writing a script in python that requires the use of tesseract to read a number like this:

enter image description here

Using digits only and -psm 6 (or 7) it outputs 5.551

I have had some success with other numbers (5.700 works) but this particular number is giving me a ton of problems. Unfortunately i need a high degree of accuracy for my program but i thought tesseract would be able to decipher such a simple string.

I have also tried to use GOCR and that correctly read 6.881 (yay!) but gave the output 5._00 for 5.700 (boo!)

Any idea why it would be doing this?

Or more importantly, anything i can do to get around the problem ( preferably without having to train tesseract ).

Karol S Karol S · Accepted Answer · 2013-11-13T13:32:33

I doubled its size and removed the transparency (replacing it with white) using Imagemagick (you can use something else if you want) and Tesseract OCR'd the enhanced image correctly:

$ convert I1Zau.png -background white -flatten -resize 200% I1Zau_2.png
$ tesseract I1Zau_2.png o.txt
$ cat o.txt.txt 
6.881

Tesseract has trouble reading this extremely simple string of numbers

3 Answers