
Tesseract setVariable whitelist works ok for english language for example i use this to recognize only digits and letters from image (excluding special characters &*^%! etc)


But i can't do the same thing for Thai language


Is there a different principle? Because this does not work. Instead of all determined characters I receive only digits in output, tesseract ignores all Thai letters which I put into the whitelist.

How can I pass this variable correctly?


1 Answers


You might need to use the language package for Thai first... please refer the download list here https://code.google.com/p/tesseract-ocr/downloads/list

Then you need to replace "eng" with "tha" in your code to use the new language data to OCR