Is there a way to use tesseract for single digit numbers?

Question

TL;DR It appears that tesseract cannot recognize images consisting of a single digit. Is there a workaround/reason for this?

I am using (the digits only version of) tesseract to automate inputting invoices to the system. However, I noticed that tesseract seems to be unable to recognize single digit numbers such as the following:

The raw scan after crop is:

enter image description here

After I did some image enhancing:

enter image description here

It works fine if it has at least two digits:

enter image description here

I've tested on a couple of other figures:

Not working: enter image description here , ,

Working: enter image description here , ,

If it helps, for my purpose all inputs to tesseract has been cropped and rotated like above. I am using pyocr as a bridge between my project and tesseract.

vSomers vSomers · Accepted Answer · 2017-07-24T19:51:46

Here's how you can configure pyocr to recognize individual digits:

from PIL import Image
import sys
import pyocr
import pyocr.builders

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
tool = tools[0]

im = Image.open('digit.png')
builder = pyocr.builders.DigitBuilder()

# Set Page Segmentation mode to Single Char :
builder.tesseract_layout = 10 # If tool = tesseract
builder.tesseract_flags = ['-psm', '10'] # If tool = libtesseract

result = tool.image_to_string(im, lang="eng", builder=builder)

Is there a way to use tesseract for single digit numbers?

3 Answers