I'm using tesseract OCRwith python-tesseract. In the tesseract FAQ, regarding digits, we have:
Use
TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");
BEFORE calling an Init function or put this in a text file called tessdata/configs/digits:
tessedit_char_whitelist 0123456789
and then your command line becomes:
tesseract image.tif outputbase nobatch digits
Warning: Until the old and new config variables get merged, you must have the nobatch parameter too.
In python-tesseract, the SetVariable method exists. I've tried this, but the result of the OCR is the same:
api = tesseract.TessBaseAPI()
api.SetVariable("tessedit_char_whitelist", "0123456789")
api.Init('.','eng',tesseract.OEM_DEFAULT)
api.SetPageSegMode(tesseract.PSM_AUTO)
Did anyone already got this working, or should I consider it a bug in python-tesseract?