I am using Tesseract for OCR purposes and I have added few additional words into "fin.user-words" (I would like to avoid creating a new word list and replacing tessdata/fin.word-dawg with it). Now, I succeeded doing it in command prompt:
>tesseract image.png result -l fin TestConfig
where TestConfig (Tesseract configuration file located under .../tessdata/configs) supresses the system dictionaries and forces Tesseract to load my words:
load_system_dawg F
load_freq_dawg F
user_words_suffix user-words
I am trying to replicate the above procedure of command line, in Java but it seems that Tesseract ignores the configuration options. Here is the part of the Java code I am using:
public static TestTesseract(BufferedImage image) {
Tesseract instance = Tesseract.getInstance();
instance.setLanguage("fin");
instance.setTessVariable("load_system_dawg", "F");
instance.setTessVariable("load_freq_dawg", "F");
instance.setTessVariable("user_words_suffix", "user-words");
try {
String result = instance.doOCR(image);
System.out.println(result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
}
Below is the nearest question to mine I could find; however, I could not find setConfigs method:
instance.setConfigs(Arrays.asList("bazaar");