5
votes

I've been trying to work on an image processing script /OCR that will allow me to extract the letters (using tesseract) from the boxes found in the image below.

http://i622.photobucket.com/albums/tt310/seraphelitis/rename_zps80dcdd06.png

Following alot of processing, I was able to get the picture to look like this

3

In order to remove the noise I inverted the image followed by floodfilling and gaussian blurring to remove noise. This is what I ended up with next.

4

After running it through some threholding and erosion to remove the noise (erosion being the step that distorted the text) I was able to get the image to look like this before running it through tesseract

enter image description here

This, while a pretty good rendering, allows for fairly accurate results through tesseract. Though it sometimes fails because it reads the hash (#) as a H or W. This leads me to my question!

Is there a way using opencv, skimage, PIL (opencv preferably) I can sharpen this image in order to increase my chances of tesseract properly reading my image? OR Is there a way I can get from the third to final image WITHOUT having to use erosion which ultimately distorted the text in the image.

Any help would be greatly appreciated!

1
Can you share which preprocessing you already do ?Dabo
I added a few of the key steps that caused the distortion I was referring to : )JamesLLee
I've demonstrated step by step how to achieve this on this answer.karlphillip
thanks Karl , this looks extremely useful though I'm borderline illiterate when it comes to C++ ( I used python) : )JamesLLee

1 Answers

4
votes

OpenCV does has functions like filter2D that convolves arbitrary kernel with given image. In particular you can use kernels that are used for image sharpening. The main question is whether this will improve the results of your OCR library or not. The image is already pretty sharp and the noise in the image is not a result of blur. I never worked with teseract myself, but I am fairly sure that it already does all the noise reduction it could. And 'helping' him in this process may actually have opposite effect. For example any sharpening process tends to amplify noise (as opposite to noise reduction processes that usually are blurring images). Most of computer vision libraries give better results when provided with raw (unprocessed) images.

Edit (after question update): There multiple ways to do so. The first one that I would test is this: Your first binary image is pretty clean and sharp. Instead of of using morphological operations that reduce quality of letters switch to filtering contours. Use findContours function to find all contours in the image and store their hierarchy (i.e. which contour is inside which). From all the found contours you actually need only the contours on first and second levels, i.e. outer and inner contours of each letter (contours at zero level are the outermost contours). Other contours can be discarded. Among the contours that do belong to first level you can discard those whose bounding box is too small to be a real letter. After those two discarding procedures I would expect that most of the remaining contours are the ones that are parts of the letters. Draw them on white image and run OCR. (If you want white letters on black background you will need to invert the order of vertices in the contours).