0
votes

Thanks to many SO Tesseract OCR-related posts, specifically this one, I'm now on my way to integrating Tesseract into an app, specifically for OCR'ing grocery receipts. However, I am getting garbage results and I can't figure out why. I've triple checked the tessdata language files, which are the English Cube Data Files for Version 3.04/3.05.

Here is the core of my code:

            if let tesserect = G8Tesseract(language: "eng") {

            self.receiptPhoto.image = UIImage(named: "TradersReceipt")?.g8_blackAndWhite()
            self.receiptPhoto.image = self.receiptPhoto.image?.toGrayScale()
            self.receiptPhoto.image = self.receiptPhoto.image?.binarise()
            self.receiptPhoto.image = self.receiptPhoto.image?.scaleImage()

            // OCR the receipt in receiptPhoto
            tesserect.delegate = self as G8TesseractDelegate
            tesserect.engineMode = .tesseractCubeCombined
            tesserect.pageSegmentationMode = .singleBlock
            tesserect.image = self.receiptPhoto.image
            tesserect.recognize()

            // Update the UITextField in the destination VC being segued to
            textOfReceipt.text = tesserect.recognizedText

        }

Note that the functions toGrayScale, etc., are from the SO post linked to in the first sentence of this post. As can be seen, I've just hard-coded a Trader Joe's receipt for testing, and that receipt is here:

Trader Joe's Receipt

But here's my problem, the UITextView that gets displayed (textOfReceipt.text), is garbage:

Garbage Results from tesseract.recognize

I feel like I'm missing something simple. Any and all help appreciated.

1

1 Answers

0
votes

Turns out the code was just fine. Tesseract simply just doesn't process Trader Joe's receipts very well, presumably because Tesseract can't handle the font Trader Joe's uses very well and/or gets confused with some of the special characters on the receipt. Other receipts do better, although there is a wide range of the quality of the results when OCR'ing receipts. If I OCR regular text, e.g., from a book, the results are fantastic.