Iam trying to extract text from an image file using Tesseract OCR in Python but I'am facing an Error that i can figure out how to deal with it. all my environment is good as i tested some sample image with the ocr in python!
here is the code
from PIL import Image
import pytesseract
strs = pytesseract.image_to_string(Image.open('binarized_image.png'))
print (strs)
the follow is the error I get from eclipse console
strs = pytesseract.image_to_string(Image.open('binarized_body.png'))
File "C:\Python35x64\lib\site-packages\pytesseract\pytesseract.py", line 167, in image_to_string
return f.read().strip()
File "C:\Python35x64\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 20: character maps to <undefined>
Iam using python 3.5 x64 on Windows10
sys.setdefaultencoding
to see if that helps you diagnose the problem? (I'd probably avoid keeping that hack around in production code if you can help it though.) – Benjamin Hodgson