2
votes

I have used tesseract 3.04 with python and pytesseract(from Pypi) now I want to use the new LSTM based 4.00.00alpha
I'm using kali linux so i installed libtesserct4(using apt-get) it created its folder named 4.00 in tesseract-ocr but when I try to use it with pytesseract it does not recognize --eom input
the code is:

pytesseract.image_to_string(Image.open(filename),lang="en",config='--eom 2')

Result:

read_params_file: Can't open 1

oem input does not also appear when I use tesseract -h command.

It does not recognize training data files in folder tesseract-ocr/4.00/tessdata it only recognize training data in the folder tesseract-ocr/tessdata
If there is any problem with pytesseract could you please tell me how to setup a python wrapper for tesseract 4

Thanks

2

2 Answers

3
votes

I would recommend to use tesserocr. It supports Tesseract 4 and is a true wrapper around the C++ API in contrast to pytesseract which just calls tesseract CLI. Training is a whole different story and you should follow the guide provided by the developers.

0
votes

You may try below. It works for Tesseract 4.0.0a with Python 3.6.

ocr = pytesseract.image_to_string(Image.open(filename), lang="eng",\
      boxes=False, config="--psm 3 --oem 2")

--psm 3 is the default Page Segmentation Mode.

Hope this help.