I am trying to train Tesseract 4 to recognise some electronic circuit diagram symbols such as a resistor, capacitor etc from images but there seems to be no straight forward guide into training tesseract and the official documentation seems to focus more on fonts instead of image data.
The reply on this post seems to be the most helpful thing I've found so far but when following the steps I get an error:
What I've done so far:
- Successfully compiled tesseract 4.1.1 and the training tools on ubuntu 16
- Successfully cloned the tesstrain repo
- Generated 4 tif images of components titled image0.tiff - image.3.tiff
- Generated 4 plain text files with the same name titeld image0.gt.txt - image3.gt.txt
- Each text file has the name of the component in it, eg resistor, capacitor etc.
- Moved these files into the appropiate location (tesstrain/data)
Note: I know I need way more data than this, this is simply just a test to get everything working and sucessfully make a .traineddata file.
When I run the command "make training MODEL_NAME=testModel_1" I get the following in my console:
@CKVM1:~/Downloads/tesstrain$ make training MODEL_NAME=testModel_1
find: ‘data/testModel_1-ground-truth’: No such file or directory
find: ‘data/testModel_1-ground-truth’: No such file or directory
Error: missing ground truth for training
Makefile:175: recipe for target 'data/testModel_1/list.train' failed
make: *** [data/testModel_1/list.train] Error 1
I believe the issue is that, in the post I linked the instructions say to the "START_MODEL" paramater which as far as I understand uses whichever language you set it as as a starting point to improve training time but since I'm using custom symbols and not actual letters I don't see how that would benefit me. It seems the issue is however, that it expects a (more general?) ground truth file to already be present before the training starts which I am unsure how to go about solving
Any ideas on how to resolve this?