I have some questions about making tiff/box files for tesseract 4. In TrainingTesseract 4.00 document written:
Making Box Files As with base Tesseract, there is a choice between rendering synthetic training data from fonts, or labeling some pre-existing images (like ancient manuscripts for example).
But it did not explain how to train with pre-existing images.
I want to train for the Persian language in tesseract 4 (lstm). I have some images from ancient manuscripts and want to train with images and texts instead of font. So I can’t use text2image
command. I know that the old format box files will not work for LSTM training.
- How can I make tif/box for tessearct 4 lstm then label them and how to change tesseract commands?
- Should I use other tools for generating box files (Given that Persian language is right to left )?
- Should I use fine tuning or train from Scratch?