Tesseract OCR fails on TIFF files

Question

I have a multiple page .tif file, I am trying to extract text from it using Tesseract OCR but I am getting this error

TypeError: Unsupported image object

Code

from PIL import Image
import pytesseract

img = Image.open('Group 1/1_CHE_MDC_1.tif')
text = pytesseract.image_to_string(img.seek(0))  # OCR on 1st Page
text = ' '.join(text.split())
print(text)

ERROR

Any idea why its happening

Blender Blender · Accepted Answer · 2018-09-16T03:10:49

Image.seek does not have a return value so you're essentially running:

pytesseract.image_to_string(None)

Instead do:

img.seek(0)
text = pytesseract.image_to_string(img)

Tesseract OCR fails on TIFF files

2 Answers