Image Preprocessing steps to improve the recognition rate

Question

I am making a simple OCR Android App using TessBaseAPI for my project. I have done some image preprocessing steps like binarization and image inhancement. But their result is 50% to 60%. How can we improve the recognition rate?

I include two sample images.

http://imageshack.us/photo/my-images/94/1school.jpg/

http://imageshack.us/photo/my-images/43/15071917.jpg/

Kurt Pfeifle Kurt Pfeifle · Accepted Answer · 2012-08-18T22:22:54

The following additions to above command works for your second image:

-negate \
-deskew 40% \
+repage \
-crop 393x110+0+0 \

They add appropriate levels of deskewing and cropping to the result, so that Tesseract's life gets a bit easier...

So the complete command should be the following, which produces the correct result on my system:

convert 15071917.jpg            \
   -type grayscale              \
   -negate                      \
   -gamma 1                     \
   -contrast  -contrast  -contrast  -contrast  -contrast  -contrast  -contrast  -contrast  -contrast  -contrast  \
   -normalize -normalize -normalize -normalize -normalize -normalize -normalize -normalize -normalize -normalize \
   -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle -despeckle \
   -negate                      \
   -deskew 40%                  \
   +repage                      \
   -crop 393x110+0+0            \
    15071917.png                \
&&                              \
tesseract 15071917.png OUT && cat OUT.txt

  Tesseract Open Source OCR Engine v3.01 with Leptonica
    Page 0
    TESCO

This is the original picture (left) with the resulting picture of the modified command (right):

Image Preprocessing steps to improve the recognition rate

3 Answers