0
votes

My project is in Python 3.6.2. I'm trying to identify whether images are worth downloading at all (if they have a certain aspect ratio) by reading only the header (first ~100 bytes of the online file), so far just testing with imghdr and Pillow.

Image.open fails at the end with:

File "C:\Program Files (x86)\Python36-32\lib\site-packages\PIL\Image.py", line 2349, in open % (filename if filename else fp))

OSError: cannot identify image file <_io.BytesIO object at 0x02F41960>

I found Release notes 2.8.0 for Pillow which seemed to suggest I'd be able to use Image.open(requests.raw). I guessed I should be able to reuse the already-downloaded header after ensuring I reset it with seek(0).

Other answers with this error seem to deal with saving the image buffer to an actual file, which I am trying to avoid (just reusing the downloaded bytes from response.raw for all my test/checks, and not making multiple download requests to any server.)

Where am I going wrong please?

Here is my sample code:

import requests
from PIL import Image
import imghdr
import io

if __name__ == '__main__':
    url = "https://ichef-1.bbci.co.uk/news/660/cpsprodpb/37B5/production/_89716241_thinkstockphotos-523060154.jpg"
    try:
        response = requests.get(url, stream=True)
        if response.status_code == 200:
            response.raw.decode_content = True

            # Grab first 100 bytes as potential image header
            header = response.raw.read(100)
            ext = imghdr.what(None, h=header)
            print("Found: " + ext)
            if ext != None:     # Proceed to other tests if we received an image at all
                header = io.BytesIO(header)
                header.seek(0)
                im = Image.open(header)
                im.verify()

                # other image-related tasks here
        else:
            print("Received error " + str(response.status.code))
    except requests.ConnectionError as e:
        print(e)
1

1 Answers

0
votes

You have to get the rest of the image data before calling Image.open().

This is what I mean:

import requests
from PIL import Image
import imghdr
import io

if __name__ == '__main__':
    url = "https://ichef-1.bbci.co.uk/news/660/cpsprodpb/37B5/production/_89716241_thinkstockphotos-523060154.jpg"
    try:
        response = requests.get(url, stream=True)
        if response.status_code == 200:
            response.raw.decode_content = True

            # Grab first 100 bytes as potential image header
            header = response.raw.read(100)
            ext = imghdr.what(None, h=header)
            print("Found: " + ext)
            if ext != None:     # Proceed to other tests if we received an image at all
                data = header + response.raw.read()  # GET THE REST OF THE FILE
                data = io.BytesIO(data)
                im = Image.open(data)
                im.verify()

                # other image-related tasks here
        else:
            print("Received error " + str(response.status.code))
    except requests.ConnectionError as e:
        print(e)