OCR(Text Recognition) result from OpenCV 3.1 + tesseract 3.04 varies depending on the order of recognition

Question

I'm currently using following sample code:

https://github.com/opencv/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp

The weird thing is, seemingly OCR result can vary depending on the order of images I passed to ocr module.

e.g.) If there's a 100 frames of images with text, and the recognized result can be different between - I passes each frames sequentially to tesseract module (total 100 frames) and - I passes each frames not in sequential order to tesseract modules (total 100 frames)

Ideally, both of above cases should have a same result.

I already confirmed that it is not caused by erFilter parts. The result coming out from erFilter is exactly same. This difference seems to be happening inside of tesseract or tesseract wrapper in openCV.

The difference can also vary from just a confidence value has small difference, to the recognized text is different.

I feel OpenCV or tesseract remembered something and affecting to new frame's OCR result, but I couldn't find any specs telling so.

Please let me know if it is normal as OpenCV/Tesseract behavior.

nguyenq nguyenq · Accepted Answer · 2016-11-16T01:33:34

Try to clear the adaptive data with ClearAdaptiveClassifier() or turn off the adaptive classifier with config variables:

classify_enable_learning 0
classify_enable_adaptive_matcher 0

See Tesseract FAQ.

OCR(Text Recognition) result from OpenCV 3.1 + tesseract 3.04 varies depending on the order of recognition

1 Answers