In Python, what are the differences between the urllib
, urllib2
, urllib3
and requests
modules? Why are there three? They seem to do the same thing...
11 Answers
I know it's been said already, but I'd highly recommend the requests
Python package.
If you've used languages other than python, you're probably thinking urllib
and urllib2
are easy to use, not much code, and highly capable, that's how I used to think. But the requests
package is so unbelievably useful and short that everyone should be using it.
First, it supports a fully restful API, and is as easy as:
import requests
resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')
Regardless of whether GET / POST, you never have to encode parameters again, it simply takes a dictionary as an argument and is good to go:
userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)
Plus it even has a built in JSON decoder (again, I know json.loads()
isn't a lot more to write, but this sure is convenient):
resp.json()
Or if your response data is just text, use:
resp.text
This is just the tip of the iceberg. This is the list of features from the requests site:
- International Domains and URLs
- Keep-Alive & Connection Pooling
- Sessions with Cookie Persistence
- Browser-style SSL Verification
- Basic/Digest Authentication
- Elegant Key/Value Cookies
- Automatic Decompression
- Unicode Response Bodies
- Multipart File Uploads
- Connection Timeouts
- .netrc support
- List item
- Python 2.6—3.4
- Thread-safe.
urllib2 provides some extra functionality, namely the urlopen()
function can allow you to specify headers (normally you'd have had to use httplib in the past, which is far more verbose.) More importantly though, urllib2 provides the Request
class, which allows for a more declarative approach to doing a request:
r = Request(url='http://www.mysite.com')
r.add_header('User-Agent', 'awesome fetcher')
r.add_data(urllib.urlencode({'foo': 'bar'})
response = urlopen(r)
Note that urlencode()
is only in urllib, not urllib2.
There are also handlers for implementing more advanced URL support in urllib2. The short answer is, unless you're working with legacy code, you probably want to use the URL opener from urllib2, but you still need to import into urllib for some of the utility functions.
Bonus answer With Google App Engine, you can use any of httplib, urllib or urllib2, but all of them are just wrappers for Google's URL Fetch API. That is, you are still subject to the same limitations such as ports, protocols, and the length of the response allowed. You can use the core of the libraries as you would expect for retrieving HTTP URLs, though.
This is my understanding of what the relations are between the various "urllibs":
In the Python 2 standard library there exist two HTTP libraries side-by-side. Despite the similar name, they are unrelated: they have a different design and a different implementation.
urllib
was the original Python HTTP client, added to the standard library in Python 1.2. Earlier documentation forurllib
can be found in Python 1.4.urllib2
was a more capable HTTP client, added in Python 1.6, intended as a replacement forurllib
:urllib2 - new and improved but incompatible version of urllib (still experimental).
Earlier documentation for
urllib2
can be found in Python 2.1.
The Python 3 standard library has a new urllib
which is a merged/refactored/rewritten version of the older modules.
urllib3
is a third-party package (i.e., not in CPython's standard library). Despite the name, it is unrelated to the standard library packages, and there is no intention to include it in the standard library in the future.
Finally, requests
internally uses urllib3
, but it aims for an easier-to-use API.
urllib and urllib2 are both Python modules that do URL request related stuff but offer different functionalities.
1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL.
2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn't have such a function. This is one of the reasons why urllib is often used along with urllib2.
Requests - Requests’ is a simple, easy-to-use HTTP library written in Python.
1) Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib.encode() to encode the parameters before passing them.
2) It automatically decoded the response into Unicode.
3) Requests also has far more convenient error handling.If your authentication failed, urllib2 would raise a urllib2.URLError, while Requests would return a normal response object, as expected. All you have to see if the request was successful by boolean response.ok
Just to add to the existing answers, I don't see anyone mentioning that python requests is not a native library. If you are ok with adding dependencies, then requests is fine. However, if you are trying to avoid adding dependencies, urllib is a native python library that is already available to you.
To get the content of a url:
try: # Try importing requests first.
import requests
except ImportError:
try: # Try importing Python3 urllib
import urllib.request
except AttributeError: # Now importing Python2 urllib
import urllib
def get_content(url):
try: # Using requests.
return requests.get(url).content # Returns requests.models.Response.
except NameError:
try: # Using Python3 urllib.
with urllib.request.urlopen(index_url) as response:
return response.read() # Returns http.client.HTTPResponse.
except AttributeError: # Using Python3 urllib.
return urllib.urlopen(url).read() # Returns an instance.
It's hard to write Python2 and Python3 and request
dependencies code for the responses because they urlopen()
functions and requests.get()
function return different types:
- Python2
urllib.request.urlopen()
returns ahttp.client.HTTPResponse
- Python3
urllib.urlopen(url)
returns aninstance
- Request
request.get(url)
returns arequests.models.Response
I think all answers are pretty good. But fewer details about urllib3.urllib3 is a very powerful HTTP client for python. For installing both of the following commands will work,
urllib3
using pip,
pip install urllib3
or you can get the latest code from Github and install them using,
$ git clone git://github.com/urllib3/urllib3.git
$ cd urllib3
$ python setup.py install
Then you are ready to go,
Just import urllib3 using,
import urllib3
In here, Instead of creating a connection directly, You’ll need a PoolManager instance to make requests. This handles connection pooling and thread-safety for you. There is also a ProxyManager object for routing requests through an HTTP/HTTPS proxy Here you can refer to the documentation. example usage :
>>> from urllib3 import PoolManager
>>> manager = PoolManager(10)
>>> r = manager.request('GET', 'http://google.com/')
>>> r.headers['server']
'gws'
>>> r = manager.request('GET', 'http://yahoo.com/')
>>> r.headers['server']
'YTS/1.20.0'
>>> r = manager.request('POST', 'http://google.com/mail')
>>> r = manager.request('HEAD', 'http://google.com/calendar')
>>> len(manager.pools)
2
>>> conn = manager.connection_from_host('google.com')
>>> conn.num_requests
3
As mentioned in urrlib3
documentations,urllib3
brings many critical features that are missing from the Python standard libraries.
- Thread safety.
- Connection pooling.
- Client-side SSL/TLS verification.
- File uploads with multipart encoding.
- Helpers for retrying requests and dealing with HTTP redirects.
- Support for gzip and deflate encoding.
- Proxy support for HTTP and SOCKS.
- 100% test coverage.
Follow the user guide for more details.
- Response content (The HTTPResponse object provides status, data, and header attributes)
- Using io Wrappers with Response content
- Creating a query parameter
- Advanced usage of urllib3
requests
requests uses urllib3
under the hood and make it even simpler to make requests
and retrieve data.
For one thing, keep-alive is 100% automatic, compared to urllib3
where it's not. It also has event hooks which call a callback function when an event is triggered, like receiving a response
In requests
, each request type has its own function. So instead of creating a connection or a pool, you directly GET a URL.
For install requests
using pip just run
pip install requests
or you can just install from source code,
$ git clone git://github.com/psf/requests.git
$ cd requests
$ python setup.py install
Then, import requests
Here you can refer the official documentation, For some advanced usage like session object, SSL verification, and Event Hooks please refer to this url.
You should generally use urllib2, since this makes things a bit easier at times by accepting Request objects and will also raise a URLException on protocol errors. With Google App Engine though, you can't use either. You have to use the URL Fetch API that Google provides in its sandboxed Python environment.
A key point that I find missing in the above answers is that urllib returns an object of type <class http.client.HTTPResponse>
whereas requests
returns <class 'requests.models.Response'>
.
Due to this, read() method can be used with urllib
but not with requests
.
P.S. : requests
is already rich with so many methods that it hardly needs one more as read()
;>
urllib
in Python 3 is yet another option, cleaned up in various ways. But thankfully the official documentation also notes that "The Requests package is recommended for a higher-level HTTP client interface." at 21.6. urllib.request — Extensible library for opening URLs — Python 3.6.3 documentation – nealmcburllib3
is and howurllib3
is different from the officialurllib
module. – Rick