Thanks to many SO Tesseract OCR-related posts, specifically this one, I'm now on my way to integrating Tesseract into an app, specifically for OCR'ing grocery receipts. However, I am getting garbage results and I can't figure out why. I've triple checked the tessdata language files, which are the English Cube Data Files for Version 3.04/3.05.
Here is the core of my code:
if let tesserect = G8Tesseract(language: "eng") {
self.receiptPhoto.image = UIImage(named: "TradersReceipt")?.g8_blackAndWhite()
self.receiptPhoto.image = self.receiptPhoto.image?.toGrayScale()
self.receiptPhoto.image = self.receiptPhoto.image?.binarise()
self.receiptPhoto.image = self.receiptPhoto.image?.scaleImage()
// OCR the receipt in receiptPhoto
tesserect.delegate = self as G8TesseractDelegate
tesserect.engineMode = .tesseractCubeCombined
tesserect.pageSegmentationMode = .singleBlock
tesserect.image = self.receiptPhoto.image
tesserect.recognize()
// Update the UITextField in the destination VC being segued to
textOfReceipt.text = tesserect.recognizedText
}
Note that the functions toGrayScale, etc., are from the SO post linked to in the first sentence of this post. As can be seen, I've just hard-coded a Trader Joe's receipt for testing, and that receipt is here:
But here's my problem, the UITextView that gets displayed (textOfReceipt.text), is garbage:
Garbage Results from tesseract.recognize
I feel like I'm missing something simple. Any and all help appreciated.