ImageMagick to preprocess image for tesseract-ocr

Question

Is there anyway to process an image like this with ImageMagick so that I can use tesseract-ocr to convert it to text?

Because of the lines in the background I get nonsense from conventional methods. Does anyone know how to deal with an image such as this?

'convert -density 300 -units PixelsPerInch -type Grayscale +compress input.png input.tif' followed by 'tesseract input.tif output -l eng' gives me utter garbage.

Or are there any alternatives to ImageMagick that I can use to pre-process such an image whether through command-line or in python?

Aleksander Grzyb Aleksander Grzyb · Accepted Answer · 2014-02-27T08:33:28

Have you tried morphology operations Morphology of Shapes after converting image to grayscale?

ImageMagick to preprocess image for tesseract-ocr

1 Answers