6
votes

I have been implementing an Android OCR tool using tesseract to ocr digits only. So far, it is giving quite high accuracy with normal digit fonts. However, the accuracy is terrible when it comes to 7 segment digits (those found on LCDs) .

I have tried cropping my image, whitelist with 0 to 9 and also some image processing to no avail. Any ideas out there on how to increase the accuracy ? Or perhaps some tips on training the specific 7 segment digits for tesseract will definitely help me a lot.

Thanks in advance.

1
I don't think you can get good results without retraining. It would be nice if there were a publicly available traineddata file for 7-segment digits, but I wasn't able to find one when I looked.rmtheis
Thank you for the reply. Your blog really helped me a lot in my implementation. So, lots of thanks to you. I am planning to train it and am looking into bbtesseract for the boxing process. I will highly appreciate it if anyone can share some tips for the training process because the official one is kinda confusing to me.laurie7
You can use jTessBoxEditor to edit or generate TIFF/box files to be used in training. There's also a PowerShell script train.ps1 that helps automate the rest of the training.nguyenq
@laurie7: did u find good example to train the tesseractTerril Thomas
tesseract img.png out -psm 7 digits does this command helps ?yunas

1 Answers

2
votes

You can find traineddata for 7 segments at:

https://github.com/arturaugusto/display_ocr/tree/master/letsgodigital

There is also a sample python code at the same repository.