python - pytesseract using tesseract 4.0 numbers only not working

Question

Any one tried to get numbers only calling the latest version of tesseract 4.0 in python?

The below worked in 3.05 but still returns characters in 4.0, I tried removing all config files but the digits file and still didn't work; any help would be great:

im is an image of a date, black text white background:

import pytesseract
im =  imageOfDate
im = pytesseract.image_to_string(im, config='outputbase digits')
print(im)

Add image to the question for answerers to see your problem. — thewaywewere
I went with stackoverflow.com/questions/9413216/… instead. — Cees Timmerman
@CuriousGeorge: Did you find a solution to your upgrade problem? — Jarl
Upgrading to v4.1.1 did not help me properly. I also had to download the tessdata_fast version of the trainddata files. I am attaching a detailed shell script to install 4.1.1 from the source. — Aritra Roy Gosthipaty

thewaywewere thewaywewere · Accepted Answer · 2017-10-05T15:38:09

You can specify the numbers in the tessedit_char_whitelist as below as a config option.

ocr_result = pytesseract.image_to_string(image, lang='eng', boxes=False, \
           config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

Hope this help.

python - pytesseract using tesseract 4.0 numbers only not working

4 Answers