Swift 3 Tesseract OCR recognize returning garbage results

Question

Thanks to many SO Tesseract OCR-related posts, specifically this one, I'm now on my way to integrating Tesseract into an app, specifically for OCR'ing grocery receipts. However, I am getting garbage results and I can't figure out why. I've triple checked the tessdata language files, which are the English Cube Data Files for Version 3.04/3.05.

Here is the core of my code:

            if let tesserect = G8Tesseract(language: "eng") {

            self.receiptPhoto.image = UIImage(named: "TradersReceipt")?.g8_blackAndWhite()
            self.receiptPhoto.image = self.receiptPhoto.image?.toGrayScale()
            self.receiptPhoto.image = self.receiptPhoto.image?.binarise()
            self.receiptPhoto.image = self.receiptPhoto.image?.scaleImage()

            // OCR the receipt in receiptPhoto
            tesserect.delegate = self as G8TesseractDelegate
            tesserect.engineMode = .tesseractCubeCombined
            tesserect.pageSegmentationMode = .singleBlock
            tesserect.image = self.receiptPhoto.image
            tesserect.recognize()

            // Update the UITextField in the destination VC being segued to
            textOfReceipt.text = tesserect.recognizedText

        }

Note that the functions toGrayScale, etc., are from the SO post linked to in the first sentence of this post. As can be seen, I've just hard-coded a Trader Joe's receipt for testing, and that receipt is here:

Trader Joe's Receipt

But here's my problem, the UITextView that gets displayed (textOfReceipt.text), is garbage:

Garbage Results from tesseract.recognize

I feel like I'm missing something simple. Any and all help appreciated.

Yorma Yorma · Accepted Answer · 2017-08-28T23:26:56

Turns out the code was just fine. Tesseract simply just doesn't process Trader Joe's receipts very well, presumably because Tesseract can't handle the font Trader Joe's uses very well and/or gets confused with some of the special characters on the receipt. Other receipts do better, although there is a wide range of the quality of the results when OCR'ing receipts. If I OCR regular text, e.g., from a book, the results are fantastic.

Swift 3 Tesseract OCR recognize returning garbage results

1 Answers