Good afternoon,
I am writing an ocr program to detect text on images. So far I am getting good results but when text is black and background is white. What can I do to improve images that have white text on light colored background (yellow, green, etc)?
One original example image could be:
So far I am just converting it to grey_scale using:
image = image.convert('L')
Then apply a series of filters like for example: SHARPEN SMOOTH BLUR etc
Then i do binarization like this:
image = image.point(lambda x: 0 if x<128 else 255, '1') #refers to http://stackoverflow.com/questions/18777873/convert-rgb-to-black-or-white and also to http://stackoverflow.com/questions/29923827/extract-cow-number-from-image
My outoup images are indeed very bad for ocr feeding like this one:
What am I doing wrong? What should be the best approach for white text on light colored background?
Another doubt: is my binarization step to strong/exagerated?
Should I mix some filters? Could you suggest some?
PS: I am a total newbie to image processing, so please keep it simple =x
Thanks so much for your attention and help/advices.