1
votes

enter image description here

I am having a hard time working with Tesseract, is there a way to improve the accuracy? How do I train it for myself, if needed?

the only thing I am doing is reading the following characters, XYZ:-0123456789 that's it! The pictures always look that way.

thanks!

1
Tesseract is already working as well as can. Use higher-resolution images. github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract is a good starting point for training tesseract.sashoalm
You may use PIL or OpenCV to perform preprocessing before send the image to Tesseract. Try to improve resolution, then dilute the image to disconnect the '-' from any digit.thewaywewere

1 Answers

4
votes

The output of Tesseract 4.00alpha with your image is

$ tesseract ICKcj.png - -l eng
*: 4606 Y; 4809 Z; 698

Warning. Invalid resolution 0 dpi. Using 70 instead.

Resample the picture to 50% and setting the dpi to 300:

enter image description here

The output with this image is slightly better and the warning is vanishing:

$ tesseract ICKcj-50.png - -l eng
X: 4606 Y: 4809 Z: 698

The only thing missing are the minus signs, which are printed quite irregular (a better resolution in the picture could help). It is also possible to restrict the output pattern in tesseract. Alternatively, you can try to guess the minus afterwards depending on the spaces between the X, Y, Z and the numbers.