Bypassing Tesseract preProcessing

Question

I am running a series of OCR on images using tess4j as a wrapper for tesseract from JAVA. The process of ocr is still taking a significant amount of time (even 5 seconds sometimes) and I am trying to speed it up.

I am doing my own preprocessing and binarization of the image and it is not necessary for tesseract to do the otsu binarization.

I have read a tutorial for IOS that allows skipping the graphical processing part , but i can't find anything using tess4j.

The turial here: https://github.com/gali8/Tesseract-OCR-iOS/wiki/Tips-for-Improving-OCR-Results -
"... if you've already performed your own pre-processing/thresholding [...] you will probably want to bypass the internal Tesseract thresholding step. "

Does anybody know how I could use tess4j (from JAVA) in a way that would skip the otsu binarization?

nguyenq nguyenq · Accepted Answer · 2015-10-21T03:35:06

Check tesseract-ocr parameters list for any settings applicable. But I read that if you send in a binarized image, Tesseract will skip the thresholding on the image (source).

Bypassing Tesseract preProcessing

1 Answers