I was using tesseract-ocr (pytesseract) for spanish and it achieves very high accuracy when you set the language to spanish and of course, the text is in spanish. If you do not set language to spanish this does not perform that good. So, I'm assuming that tesseract is using many postprocessing models for spellchecking and improving the performance, I was wondering if anybody knows some of those models (ie edit distance, noisy channel modeling) that tesseract is applying. Thanks in advance!
0
votes