1
votes

Hello I'm trying to use OCR tesseract to recognize some letters in a image.

I did a convert using imagemagick and image seems to be good but its not enough

The original images:

enter image description here

The command used with imagemagick to convert

convert input.jpg -fuzz 50% -fill black -opaque black -bordercolor white -border 2 -fill black -draw "color 0,0 floodfill" -alpha off -negate -units pixelsperinch -density 72 output.jpg

The result images:

enter image description here

The OCR tesseract command:

$ tesseract output.jpg out -psm 7

Output/result:

Text: AUGU -> AUOU

Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Page 1

Text: VEGU -> VOR-OU

Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Page 1

Text: EGUV -> E6UV

Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica Page 1

Text: USEA -> USSOEA

1
Your problem is likely due to rotated letters and numbers. My understanding is that OCR generally does not like rotated characters. It expects characters to be properly oriented for best recognition. But I am not an OCR expert. So I will defer to others that may know more.fmw42
CONTINUED: Try an example that has letters that are not rotated. Does that work?fmw42
I got working with other version of tesseract, thank you!J. Metal

1 Answers

0
votes

Not sure if it was pure luck, as you have only provided a single image to test with, but I noticed you are using a noisy/fuzzy JPEG instead of a nice clean PNG, so I thresholded your image at 50% and made a PNG of it and it recognises all four letters correctly:

convert yourImage.jpeg -threshold 50% clean.png
tesseract -psm 7 clean.png out