Can you train tesseract with images instead of text and a font?

Question

In the tesseract documentation a method of training with sample text and a font is explained.
I used jTessBoxEditor but works pretty much like the tesseract training tools.
I got somewhat acceptable results with this, but I guess the optimal solution would be training tesseract with the actual kind of images it will have to recognize anyway.
As I only need to recognize digits, I can cut by hand each of them, maybe many versions of each digit, and train tesseract with those images, even setting the boxes by hand.
Is there a way to do this?

Raniem Raniem · Accepted Answer · 2018-11-02T11:49:28

If you are trying to train tesseract4, you can use ocrd-train you basically prepare images corresponding to each line of text with their ground truth and it will do all the remaining work for you.

Can you train tesseract with images instead of text and a font?

1 Answers