1
votes

Some text images are not recognized by tesseract.

FOr example consider the following rails image which is not recognized by tesseract

enter image description here

The above image when OCRed, gives no output.

And some images accuracy is not upto the mark.

I am using ruby on rails and to implement tesseract OCR text recognition I am using 'gem tesseract' and some code. What's the problem and how do I get the output with nice accuracy.

1
did you try to use ocr through docsplit?apneadiving
Is it only for pdf documents? I want to use OCR for images only.My God
Did you try it on a basic black and white images with text only to make sure tesseract is working?slykat

1 Answers

2
votes

The problem is that Tesseract is meant for images with only text. Results for images like the one you have posted are not guaranteed.

You will need to do some image processing (crop the image to only the text part), and convert the image to black-text-on-white-background.