I am using OpenCV 2.4 and Tesseract 3
I am trying to do an OCR on a 14-segment display from a webcam.
The issue is that when I trained Tesseract, I had to do enough erosion/dilation to fill the gaps of each segments. But, the image I am reading from the webcam needs to be pre-processed to remove noises. To do this, I use erosions and dilations and the resulting picture doesn't have its segments linked :
What I trained tesseract with (that's the "V" letter) : http://i.imgur.com/NbmVqkb.png (segments are all linked)
What I feed tesseract with : http://i.imgur.com/0E4iXXk.png (some segments are linked, some aren't)
The result of OCR-ing is always different and can be "OVO" as well as "EB". I thought that maybe if I trained tesseract with a more similar version of what I am actually reading (non-linked segments) it could work better but Tesseract can't be trained with blank spaces like this (it says "Empty page").
Does anyone have any idea on how to solve this ?
I tried to increase the size of erosion/dilation but then other letters aren't recognized (B and D are confusing) and overall results is lower.
Thank you !
EDIT : Basically, what I'd need is a way to link the segments together to make it easier for tesseract to read the character OR a way to train tesseract with unlinked segments (from what I've seen, that can't happen)