4
votes

Hello i try to train tesseract for a new font based on the following digits: digits with transparent background

all digits are provided in a png file with transparent background. If i create a box file from it, train it and so on - all works fine!

Now the problem, same situation but i want to train tesseract based on the following image: digits without transparent background

as you can see the digits are exactly the same as well as the positions and so on. The only difference from image 1 is that i used a yellow background and from now on nothing is working anymore. I create a box file i set the same positions as for the first image:

0 5 4 20 22 0
1 27 4 38 21 0
2 48 4 60 22 0
3 71 3 83 22 0
4 94 5 109 22 0
5 119 5 131 22 0
6 143 5 157 22 0
7 172 5 184 22 0
8 197 5 211 23 0
9 224 5 238 22 0

well and then i trained the box, but the resulting .tr file is completely empty i didn't stop here and completed all other steps. The resulting font is not possible to use!

So my question is how to train tesseract to recognize this digits no matter which background is used for them?

Edit 2016-04-16:

I used ImageMagick to preprocess the images and i found a command which works very well for all kind of backgrounds. So i wanted to train tesseract for this created images, but it doesn't work as i thought it would... . First of all i created box files, where most of them were empty. Well i used a website to organize the character positions and i spent a lot of time to make the cropping perfectly! Afterwards i created the resulting .tr files and did also the other stuff to train tesseract.

Finally i got the "traineddata", i moved the file to the "tessdata" directory of tesseract and used it like it should be used:

tesseract example.jpg output -l mg

(i called the new font "mg")

Okay whatever it doesn't recognize all or most of them! I opened this thread to find help, till now nobody really has a clue how to do this, sadly... . Please help me out.

The whole tesseract training files, which i used and created, u can find here:

Tesseract training directory (as no zip/not compressed -> view of all files of the directory)

1
Maybe OT, but you could preprocess to remove the background color.xvan

1 Answers

1
votes

You can change any color image to binary image and then use tesseract on it, that way no matter what color you are using you will always have same result.