How to piece together dashed lines in image before giving to Tesseract?

Question

I have screen images consist of some digit values. I want to recognize those digits by Tesseract 4.0. However, these numbers consist of dashed lines, such as those of the seven-segment display. Tesseract can't recognize these values because of dashes. I used Gimp and joined those dashed lines into one piece. Tesseract almost recognize values correctly. I want to do that with OpenCV. How can I join dashed lines of digits into one piece?

before joining process after joining process

Threshold your image to make it black/white. That should help tesseract. — fmw42
Thank you. It helped but if I set fixed threshold it fails for other images. How can I make it adaptive? Actually I am using YOLO to extract screen regions from images. My plan is preprocess those extracted screen regions to feed Tesseract. Apparently, this preprocessing step will affect in bad way my data pipeline. — Ugurcan

Karol Żak Karol Żak · Accepted Answer · 2020-05-19T21:44:41

From my experience with Tesseract it should easily recognize these numbers without any preprocessing.
Is it by any chance possible that this image is simply zoomed in too much and thus the numbers are to big and hard to recognize for Tesseract? I would try to work with that first and if it's not going to help than you can look into Morphological Transformation in OpenCV

How to piece together dashed lines in image before giving to Tesseract?

1 Answers