3
votes

Im writing a simple OCR in C# and im using tesseract 2.0

In my program i will recognize ONLY capital letters.

For this reason im using:

Tesseract ocr = new Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ");

So at this point i will pass an image of a single capital letter. It works quit good but sometimes it returns to me a string with TWO letters.

Input:
R
Output:
FE

Now i need to know how to set the page segmentation mode to "single character." to improve the results.

Anyone knows how to do this in C# with tesseract 2?

Becouse in the Tesseract ocr object i have only the SetVariable Methods. In iOS apis there is this method to do this:

setPageSegMode(TessBaseAPI.PSM_SINGLE_CHAR);

Anyone can help me?

1

1 Answers

2
votes

PSM is only available in Tesseract 3.0x; therefore, you'll need a compatible .NET wrapper. There is one at https://github.com/charlesw/tesseract.