In the tesseract documentation a method of training with sample text and a font is explained.
I used jTessBoxEditor but works pretty much like the tesseract training tools.
I got somewhat acceptable results with this, but I guess the optimal solution would be training tesseract with the actual kind of images it will have to recognize anyway.
As I only need to recognize digits, I can cut by hand each of them, maybe many versions of each digit, and train tesseract with those images, even setting the boxes by hand.
Is there a way to do this?
5
votes
1 Answers
0
votes
If you are trying to train tesseract4, you can use ocrd-train you basically prepare images corresponding to each line of text with their ground truth and it will do all the remaining work for you.