2
votes

Tesseract seems to have problems recognizing basic alphanumeric codes. I've tried upscaling the image, changing to a monospace font and turning off the dictionary with no improvement in OCR quality.

The image below is recognized as the following:

i3DOIIH_My ActivitiesJ

MmRSes_My Accounm DBYCAe_My Submissions1

Hrti6_My Renewam

enter image description here

As you can see the recognized characters are completely off.

2

2 Answers

2
votes

Your original image size is 1508 x 1092 pixels with 4 lines plus vertical spacing, it seems too big.

After reduced the image to 503 x 364 pixels, around 76 pixels height for the characters. enter image description here

Tesseract gives 100% OCR result on the text. enter image description here

The font size and background color do affect the OCR result. The best result would be obtained from text in black-in-white. Otherwise, image preprocessing is likely required.

Hope this help.

0
votes

Train tesseract for these type of characters including special characters.Refer this Tesseract Training