I'm having trouble with pytesseract. I know that you can restrict tesseract to a specific set of characters using command line arguments :
tesseract input.tif output nobatch digits
I found some ppl saying they can restrict tesseract with the following lines in python :
import tesseract
ocr = tesseract.TessBaseAPI();
ocr.Init(".","eng",tesseract.OEM_TESSERACT_ONLY)
ocr.SetVariable("tessedit_char_whitelist", "0123456789")
But this is for using the tesseract API, and I'm using pytesseract.... Finally I also tried :
print(image_to_string(someimage, config='outputbase digits'))
But this doesn't work as I still get letters in my output. This is weird because I am using the below code and it is working :
print(image_to_string(screen, config='-psm 10'))
PSM stands for PageSegmentationMode and it allows me to parse my imagefile as a single character. I don't understand why this works and the snippet before doesn't when they are both commandline arguments to tesseract...
Can anyone help ? I want to use both options with a custom wordlist (that i created in the config folder of tesseract).