Tesseract setVariable whitelist for another language

Question

Tesseract setVariable whitelist works ok for english language for example i use this to recognize only digits and letters from image (excluding special characters &*^%! etc)

myOCR->SetVariable("tessedit_char_whitelist",
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ");

But i can't do the same thing for russian language

myOCR->SetVariable("tessedit_char_whitelist", "0123456789абвгдежзийклмнопрстуфхцчшщъыьэюяАБВГДЕЖЗИЙКЛМОПРСТУФХЦЧШЩЭЮЯ");

is there a different principle? Because this is don't work. Instead of all determined characters i recieve only digits in output, tesseract ignores all russian letters which i put into the whitelist. Blacklist didn't work too. Is there any way to rid from it? Thanks.

Alex Hoppus Alex Hoppus · Accepted Answer · 2013-02-27T15:01:43

So the answer is to use this symbols unicode codes in whitelist, don't know how to do this exactly

Tesseract setVariable whitelist for another language

3 Answers