0
votes

I am currently writing a C++ program that should read hex data from JPEG images. I have to compile it into one single windows executable without any external resources (like the "tessdata" directory or config files). As I am not reading any words or sentences, I don't need any dictionaries or languages.

My problem is now that I could not find a way to initialize the API without any language files. Every example uses something like this:

tesseract::TessBaseAPI api;
if (api.Init(NULL, "eng")) {
    // error handling
    return -1;
}
// do stuff

I also found that I can call the init function without language argument and with OEM_TESSERACT_ONLY:

if(api.Init(NULL, NULL, tesseract::OcrEngineMode::OEM_TESSERACT_ONLY)) {
    // ...
}

This should disable the language/dictionary, but NULL just defaults to "eng". It seems like tesseract still wants a language file to initialize and will disable it afterwards.

This also seems to be the case for any other solutions I found so far: I always need .traineddata files to initialize the api and can disable them afterwards or using config files.

My question is now: Is there any way to initialize the tesseract API in C++ using just the executable and no other resource files?

1

1 Answers

0
votes

No. Tesseract always needs some language (default is eng) + osd (.traineddata) files. Without language data file tesseract is useless.

Your post seems that you made several wrong assumptions (e.g. about OEM_TESSERACT_ONLY), so maybe if you describe what you try to achieve with tesseract you can get better advice.