2
votes

I'd like to create a simple WPF program for OCR scanning with tesseract in which the user can choose the language/s to scan with.² Now for some reason tesseract expects there to be a tessdata folder with the language files directly in it instead of the language subfolders.

using (var engine = new TesseractEngine(@"./tessdata", "deu", EngineMode.Default))

doesn't work if the deu files aren't located directly in the tessdata folder - neither does it work when using @"./tessdata/deu".

It only works when having the language file located directly in the tessdata folder (also in the project-structure).

How to properly make use of all available languages?

²Actually, if possible later on I'd like to auto-detect the language in images - e.g. by scanning each image with each language and checking which language had the best result. If you have any idea on how this could be done please let me know.

1
Really, WPF doesn't play into this. I'm using tesseract in one of my projects. As the directory structure is so fragile, I hold the language files and native dlls within my application's resources, and on startup I dump them into the required locations when needed. You could do the same thing, dumping the required language files on demand. For example, you could have a dropdown with a language, and as the user switches, you could drop the required files. The BIG problem would be where you can write to the filesystem, and how to rebase tesseract to that directory...user1228
(My application was a website, so I could just dump them into the /bin folder. If your application is in /program files/, you won't be able to do that)user1228
Tesseract supports multiple languages, such as "eng+deu", but I've never a case that would use more than that number -- OK, maybe 3. Trying with every language won't work because for the incorrect ones, the output is going to be useless garbage anyway.nguyenq
@nguyenq That's why I also asked about auto-detecting the language of the image - e.g. by detecting which language's output looks the least like garbage. Anyways, that's not what the question is about - it's about making use of all available languages - not about making use of all of them at once.mYnDstrEAm

1 Answers

2
votes

you should create a tessdata directory in debug folder of your project and put the language files there .