http - How to download a file using python in a 'smarter' way?

Question

I need to download several files via http in Python.

The most obvious way to do it is just using urllib2:

import urllib2
u = urllib2.urlopen('http://server.com/file.html')
localFile = open('file.html', 'w')
localFile.write(u.read())
localFile.close()

But I'll have to deal with the URLs that are nasty in some way, say like this: http://server.com/!Run.aspx/someoddtext/somemore?id=121&m=pdf. When downloaded via the browser, the file has a human-readable name, ie. accounts.pdf.

Is there any way to handle that in python, so I don't need to know the file names and hardcode them into my script?

Is the filename on the server relevant? Presumably these files have some meaning to you, so you ought to be able to name them yourself. If the names don't have meaning, come up with a random unique name yourself (uuids perhaps?) — Dominic Rodger
I'd love to have file names readable and meaningful. The issue is, the script will take URLs to download from from a text file, and the URLs will be added and removed by a non-technical person. — kender

Oli Oli · Accepted Answer · 2009-05-14T08:28:43

Download scripts like that tend to push a header telling the user-agent what to name the file:

Content-Disposition: attachment; filename="the filename.ext"

If you can grab that header, you can get the proper filename.

There's another thread that has a little bit of code to offer up for Content-Disposition-grabbing.

remotefile = urllib2.urlopen('http://example.com/somefile.zip')
remotefile.info()['Content-Disposition']

http - How to download a file using python in a 'smarter' way?

5 Answers