2
votes

I'm trying to write a webserver (proxy?) so that I can make requests to, say, http://localhost:8080/foo/bar which would transparently return the response from https://www.gyford.com/foo/bar.

The python script below works for a web page itself, but some kinds of files aren't returned (e.g. https://www.gyford.com/static/hines/js/site-340675b4c7.min.js ). If I manually request that file, while this server's running, like:

import requests
r = requests.get('http://localhost:8080/static/hines/js/site-340675b4c7.min.js')

then I get:

'Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check',)

So I guess I need to handle gzipped files differently but I can't work out how.

from http.server import HTTPServer, BaseHTTPRequestHandler
import requests

HOST_NAME = 'localhost'
PORT_NUMBER = 8080
TARGET_DOMAIN = 'www.gyford.com'

class MyHandler(BaseHTTPRequestHandler):

    def do_GET(self):
        host_domain = '{}:{}'.format(HOST_NAME, PORT_NUMBER)

        host = self.headers.get('Host').replace(host_domain, TARGET_DOMAIN)

        url = ''.join(['https://', host, self.path])

        r = requests.get(url)

        self.send_response(r.status_code)

        for k,v in r.headers.items():
            self.send_header(k, v)

        self.end_headers()

        self.wfile.write( bytes(r.text, 'UTF-8') )

if __name__ == '__main__':
    server_class = HTTPServer
    httpd = server_class((HOST_NAME, PORT_NUMBER), MyHandler)
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        pass
    httpd.server_close()

EDIT: Here's the output of print(r.headers):

{'Connection': 'keep-alive', 'Server': 'gunicorn/19.7.1', 'Date': 'Wed, 26 Sep 2018 13:43:43 GMT', 'Content-Type': 'application/javascript; charset="utf-8"', 'Cache-Control': 'max-age=60, public', 'Access-Control-Allow-Origin': '*', 'Vary': 'Accept-Encoding', 'Last-Modified': 'Thu, 20 Sep 2018 16:11:29 GMT', 'Etag': '"5ba3c6b1-6be"', 'Content-Length': '771', 'Content-Encoding': 'gzip', 'Via': '1.1 vegur'}

1
To be clear: Are the output form the client r = requests.get(...? The r.header look ok for a gzip file, but you requests .js? Please explaine. Verify the 'Content-Length': '771', are this the length of gzip or .js? - stovfl
The len() of both r.text and r.content is 1726. Yes, I request a .js file, which is served gzipped by transport level compression. - Phil Gyford

1 Answers

0
votes

Question: I need to handle gzipped files differently.

I wonder, how this could work for a web page itself, but assuming some magic browser handling.


What you are doing:

    r = requests.get(url)

You get the url content, automatically decode the gzip and deflate transfer-encodings.

    self.wfile.write( bytes(r.text, 'UTF-8') )

You, write the decoded r.text, encoded as bytes, this is not the same as Transfer Encoding.

Change the following:
Read and write as raw stream of bytes – it does not transform the response content.
You can use this also for other data e.g. "html" requests.

    r = requests.get(url, stream=True)
    ...
    self.wfile.write(r.raw.read())

Note from docs.python-requests.org:
Read the chapter about Raw Response Content also.
If you want to stream very large data, you have to chunk while reading.
enter image description here

Note: This are the default Headers, python-requests are using.
There is already a 'Accept-Encoding': 'gzip, deflate' Header, so no action needed on the client side.

{'headers': {'Accept': '*/*', 
 'User-Agent': 'python-requests/2.11.1', 
 'Accept-Encoding': 'gzip, deflate', 
 'Connection': 'close', 
 'Host': 'httpbin.org'}
}