3
votes

I followed the FAQ to make Tesseract recognize digits, but all I get is a bunch of text in the output file, despite having only numbers in my image.

My command line looks like this:

tesseract --tessdata-dir ./ ./input.jpg ./output/output digits

Any ideas what could be happening?.

1
Are you using tesseract 4.0 with LSTM? For that version you will need to use different tessdata file (trained only on digits)Dmitrii Z.
I just downloaded the last version form their site, for Windows.Artemix
Is the latest version you downloaded 4.0?Dmitrii Z.
4.0-with-LSTM#400-alpha-for-windowsArtemix

1 Answers

5
votes

As mentioned in tesseract github issue you can't black or whitelist characters with tesseract 4.0 LSTM, instead you should train LSTM with characters you expect on your image.

Thanks to Shreeshrii you can try his 'experimantal' digits traineddata from here

Please note that Tesseract 4.0 is still in alpha stage and if you want - you can still use 3.* versions of tesseract which support your needs from the box. Tesseract v 3.4 tessdata is located here, library for windows can be downloaded from here