Tesseract simple image with text recognition, Help wanted to convert/transform image

Question

Hello I'm trying to use OCR tesseract to recognize some letters in a image.

I did a convert using imagemagick and image seems to be good but its not enough

The original images:

The command used with imagemagick to convert

convert input.jpg -fuzz 50% -fill black -opaque black -bordercolor white -border 2 -fill black -draw "color 0,0 floodfill" -alpha off -negate -units pixelsperinch -density 72 output.jpg

The result images:

The OCR tesseract command:

$ tesseract output.jpg out -psm 7

Output/result:

Text: AUGU -> AUOU

Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Page 1

Text: VEGU -> VOR-OU

Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Page 1

Text: EGUV -> E6UV

Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Page 1

Text: USEA -> USSOEA

Your problem is likely due to rotated letters and numbers. My understanding is that OCR generally does not like rotated characters. It expects characters to be properly oriented for best recognition. But I am not an OCR expert. So I will defer to others that may know more. — fmw42
CONTINUED: Try an example that has letters that are not rotated. Does that work? — fmw42

Mark Setchell Mark Setchell · Accepted Answer · 2017-07-05T09:33:36

Not sure if it was pure luck, as you have only provided a single image to test with, but I noticed you are using a noisy/fuzzy JPEG instead of a nice clean PNG, so I thresholded your image at 50% and made a PNG of it and it recognises all four letters correctly:

convert yourImage.jpeg -threshold 50% clean.png
tesseract -psm 7 clean.png out

Tesseract simple image with text recognition, Help wanted to convert/transform image

1 Answers