2
votes

I am using this for OCR in my android application.

I am using the default camera app for taking pictures and feeding the same image to the tesseract library. I am using this sample project currently. When I am processing any image I am getting inaccurate results along with many random single alphabets. I have read many posts and found that I need to do some preprocessing on the picture taken by me, so please help me with this. Any detailed explanation on processing of image will be extremely helpful.

Thanks.

This is the sample image

1
Sample image required.Yves Daoust
I want to use it for any kind of photos taken from books, still I will provide sample image.nihartrivedi810
There are two possible issues with this picture: 1) it is severely damaged by JPEG compression (is it the one used for OCR ?); 2) it does not look perfectly sharp, I suspect motion blur (but it is hard to tell because of 1). This font face (Times ?) does not stand such degradations because of the thinness of strokes at places. The best cure for motion blur: hold the camera firm.Yves Daoust
Are there any techniques to improve the image quality before processing, may be using opencv. I want very accurate results and I don't care about the processing time.nihartrivedi810
Yes, there are deblurring techniques. I don't believe they can really rescue such images (personal opinion). I do believe that you should spend some effort to get good images.Yves Daoust

1 Answers

1
votes

I got a pretty good result (85%) through applying a threshold filter.

Note, your input image is not the best it could be.

  • It is blurry
  • There appears to be text from the back side of the page coming through
  • The page is at an angle

If you could ensure the page was head-on, that no text bleeds from the page underneath and that the image is focussed, then you should look at applying an Otsu or adaptive threshold prior to Tesseract with OpenCV. I have had better results performing my own thresholding often than leaving it to Tesseract

http://docs.opencv.org/trunk/doc/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html