I am new to tesseract library and I set it up on Ubuntu 12.04.
I am using this data set to be recognized. When I was feeding these images to tesseract as it is (without any preprocessing) using this code I was getting 70-75% approx. accuracy.
I want accuracy to be 90+% so I did some preprocessing steps I followed to enhance the image are
Steps for Preprocessing
- Applied bottom hat operator with structured element of circle of radius 12
- Complement of image to make background white and text as black
- Enhance the contrast of resultant image
- Erode the image.
after these steps I get pretty clear images can be seen here. But now when I feed these images to tessearct using that same code accuracy get reduced to < 50% I dont know why? Is it because of tesseract do some preprocessing as well? if yes then how can I restrict tesseract from doing that preprocessing. If not then why it is giving me bad results when image is pretty clear now? Pardon me if I have asked some basic question.