With my firefox browser I log in to a download site and click on one of its query-buttons. A small window pops up, named "Opening report1.csv" and I can choose to 'Open with' or 'Save File'. I save the file.
For this action Live HTTP headers shows me:
https:// myserver/ReportPage?download&NAME=ALL&DATE=THISYEAR
GET /ReportPage?download&NAME=ALL&DATE=THISYEAR HTTP/1.1
Host: myserver
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: en-US,en;q=0.8,de-DE;q=0.5,de;q=0.3
Accept-Encoding: gzip, deflate, br
Referer: https:// myserver/ReportPage?4&NAME=ALL&DATE=THISYEAR
Cookie: JSESSIONID=88DEDBC6880571FDB0E6E4112D71B7D6
Connection: keep-alive
Upgrade-Insecure-Requests: 1HTTP/1.1 200 OK
Date: Sat, 30 Dec 2017 22:37:40 GMT
Server: Apache-Coyote/1.1
Last-Modified: Sat, 30 Dec 2017 22:37:40 GMT
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Pragma: no-cache
Cache-Control: no-cache, no-store
Content-Disposition: attachment; filename="report1.csv"; filename*=UTF-8''report1.csv
Content-Type: text/csv
Content-Length: 332369
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Now I try to emulate this with requests.
$ python3
>>> import requests
>>> from lxml import html
>>>
>>> s = requests.Session()
>>> s.verify = './myserver.crt' # certificate of myserver for https
>>>
>>> # get the login web page to enter username and password
... r = s.get( 'https://myserver' )
>>>
>>> # Get url for logging in. It's the action-attribute in the form anywhere.
... # We use xpath.
... tree = html.fromstring(r.text)
>>> loginUrl = 'https://myserver/' + list(tree.xpath("//form[@id='id4']/@action"))[0]
>>> print( loginUrl ) # it contains a session-id
https://myserver/./;jsessionid=77EA70CB95252426439097E274286966?0-1.loginForm
>>>
>>> # logging in with username and password
... r = s.post( loginUrl, data = {'username':'ingo','password':'mypassword'} )
>>> print( r.status_code )
200
>>> # try to get the download file using url from Live HTTP headers
... downloadQueryUrl = 'https://myserver/ReportPage?download&NAME=ALL&DATE=THISYEAR'
>>> r = s.get( downloadQueryUrl )
>>> print( r.status_code)
200
>>> print( r. headers )
{'Connection': 'Keep-Alive',
'Date': 'Sun, 31 Dec 2017 14:46:03 GMT',
'Cache-Control': 'no-cache, no-store',
'Keep-Alive': 'timeout=5, max=94',
'Transfer-Encoding': 'chunked',
'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT',
'Pragma': 'no-cache',
'Content-Encoding': 'gzip',
'Content-Type': 'text/html;charset=UTF-8',
'Server': 'Apache-Coyote/1.1',
'Vary': 'Accept-Encoding'}
>>> print( r.url )
https://myserver/ReportPage?4&NAME=ALL&DATE=THISYEAR
>>>
The request succeeds but I don't get the file download page. There is no "Content-Disposition: attachment;" entry in the header. I only get the page the query starts from, e.g. the page from the referer.
Has this something to do with the session-cookie? Seems requests manages this automagically. Is there a special handling for csv-files? Do I have to use streams? Is the download-Url shown by Live HTTP Headers the right one? Maybe there is a dynamic creation?
How can I get a web page with "Content-Disposition: attachment;" from myserver and download its file with requests?
r.text? Maybe there are useful inforamtion - ie. it can be warning message. You could write it in file and open this file in browser. - furasmechanizePython module. - Patrick MevzekdownloadQueryUrlfrom the download page. Maybe we have a dynamic creation. - Ingo