I want to be able to recognize digits from images. So I have been playing around with tesseract and python. I looked into how to prepare the image and tried running tesseract on it and I must say I am pretty disappointed by how badly my digits are recognized. I have tried to prepare my images with OpenCV and thought I did a pretty good job (see examples below) but tesseract has a lot of errors when trying to identify my images. Am I expecting too much here? But when I look at these example images I think that tesseract should easily be able to identify these digits without any problems. I am wondering if the accuracy is not there yet or if somehow my configuration is not optimal. Any help or direction would be gladly appreciated.
Things I tried to improve the digit recognition: (nothing seemed to improved the results significantly)
- limit characters:
config = "--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789"
- Upscale images
- add a white border around the image to give the letters more space, as I have read that this improves the recognition process
- Threshold image to only have black and white pixels
Examples:
Image 1:
Image 2:
EDIT: Image 3:
Tesseract recognized: 1723