Tesseract OCR can't recognize basic alphanumeric codes

Question

Tesseract seems to have problems recognizing basic alphanumeric codes. I've tried upscaling the image, changing to a monospace font and turning off the dictionary with no improvement in OCR quality.

The image below is recognized as the following:

i3DOIIH_My ActivitiesJ

MmRSes_My Accounm DBYCAe_My Submissions1

Hrti6_My Renewam

As you can see the recognized characters are completely off.

thewaywewere thewaywewere · Accepted Answer · 2017-10-01T16:50:53

Your original image size is 1508 x 1092 pixels with 4 lines plus vertical spacing, it seems too big.

After reduced the image to 503 x 364 pixels, around 76 pixels height for the characters.

Tesseract gives 100% OCR result on the text.

The font size and background color do affect the OCR result. The best result would be obtained from text in black-in-white. Otherwise, image preprocessing is likely required.

Hope this help.

Tesseract OCR can't recognize basic alphanumeric codes

2 Answers