1
votes

I was trying to download images with url's that change but got an error.

url_image="http://www.joblo.com/timthumb.php?src=/posters/images/full/"+str(title_2)+"-poster1.jpg&h=333&w=225"

user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
headers = {'User-Agent': user_agent}
req = urllib.request.Request(url_image, None, headers)


print(url_image)
#image, h = urllib.request.urlretrieve(url_image)
with urllib.request.urlopen(req) as response:
    the_page = response.read()

#print (the_page)


with open('poster.jpg', 'wb') as f:
    f.write(the_page)

Traceback (most recent call last): File "C:\Users\luke\Desktop\scraper\imager finder.py", line 97, in with urllib.request.urlopen(req) as response: File "C:\Users\luke\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 162, in urlopen return opener.open(url, data, timeout) File "C:\Users\luke\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 465, in open response = self._open(req, data) File "C:\Users\luke\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 483, in _open '_open', req) File "C:\Users\luke\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 443, in _call_chain result = func(*args) File "C:\Users\luke\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1268, in http_open return self.do_open(http.client.HTTPConnection, req) File "C:\Users\luke\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1243, in do_open r = h.getresponse() File "C:\Users\luke\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1174, in getresponse response.begin() File "C:\Users\luke\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 282, in begin version, status, reason = self._read_status() File "C:\Users\luke\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 264, in _read_status raise BadStatusLine(line) http.client.BadStatusLine:

1
Try without the headers (or does the server require you be using Mozilla on Windows for some reason?). Also, we don't know what title_2 is. If there are odd characters or spaces, then it will need to be encoded.Fhaab
just use 10-cloverfield-lane for title_2 and without the headers the same error message comes upspark

1 Answers

1
votes

My advice is to use urlib2. In addition, I've written a nice function (I think) that will also allow gzip encoding (reduce bandwidth) if the server supports it. I use this for downloading social media files, but should work for anything.

I would try to debug your code, but since it's just a snippet (and the error messages are formatted badly), it's hard to know exactly where your error is occurring (it's certainly not line 97 in your code snippet).

This isn't as short as it could be, but it's clear and reusable. This is python 2.7, it looks like you're using 3 - in which case you google some other questions that address how to use urllib2 in python 3.

import urllib2
import gzip
from StringIO import StringIO

def download(url):
    """
    Download and return the file specified in the URL; attempt to use
    gzip encoding if possible.
    """
    request = urllib2.Request(url)
    request.add_header('Accept-Encoding', 'gzip')
    try:
        response = urllib2.urlopen(request)
    except Exception, e:
        raise IOError("%s(%s) %s" % (_ERRORS[1], url, e))
    payload = response.read()
    if response.info().get('Content-Encoding') == 'gzip':
        buf = StringIO(payload)
        f = gzip.GzipFile(fileobj=buf)
        payload = f.read()
    return payload

def save_media(filename, media):
    file_handle = open(filename, "wb")
    file_handle.write(media)
    file_handle.close()

title_2 = "10-cloverfield-lane"
media = download("http://www.joblo.com/timthumb.php?src=/posters/images/full/{}-poster1.jpg&h=333&w=225".format(title_2))
save_media("poster.jpg", media)