I am trying to extract alphanumeric characters (a-z0-9) which do not form sensefull words from an image which is taken with a consumer camera (including mobile phones). The characters have equal size and font type and are not formated. The actual processing is done under Windows.
The following image shows the raw input:
After perspective processing I apply the following with OpenCV:
- Convert from RGB to gray
- Apply
cv::medianBlur
to remove noise - Convert the image to binary using adaptive thresholding
cv::adaptiveThreshold
- I know the number of rows and columns of the grid. Thus I simply extract each grid cell using this information.
After all these steps I get images which look similar to these:
Then I run tesseract (latest SVN version with latest training data) on each extracted cell image individually (I tried different -psm
and -l
values):
tesseract.exe -l eng -psm 11 sample.png outtext
The results produced by tesseract are not very good:
- Most characters are not recognized.
- The grid lines are sometimes interpreted as "l" or "i" characters.
I already experimented with morphologic operations (open, close, erode, dilate) and replaced adaptive thresholding with OTSU thresholding (THRESH_OTSU
) but the results got worse.
What else could I try to improve the recognition quality? Or is there even a better method to extract the characters besides using tesseract (for instance template matching?)?
Edit (21-12-2014):
I tested simple template matching (using normalized cross correlation and LMS but with even worse results). But I have made a huge step forward by extracting each character using findCountours
and then running tesseract with only one character and the -psm 10
option which interprets each input image as a single character. Additonaly I remove non-alphanumeric characters in a post processing step. The first results are encouraging with detection rates of 90% and better. The main problem are misdetections of "9" and "g" and "q" characters.
Regards,