6
votes

I have an application of Google App Engine(GAE) and I am using Python 2.7. This application receives an GET(ajax) request from user portal(say Chrome). Upon receiving the request, I prepare Asynchronous connections for requesting data from multiple websites(say X1, X2, etc) outside GAE using urlfetch.make_fetch_call() - GET request.

This worked fine for X1 website but not for X2. Started probing on local dev server. Upon probing I suspected that X2 is checking {'User-Agent':'Python-urllib/2.7'} tag in header. This is my best guess since changing this field to {'User-Agent': 'Mozilla/5.0'} returns the desired results.

So I uploaded the code to GAE and started the process with urlfetch.make_fetch_call(). Upon intercepting this call i found that no matter what i do, the default header added by GAE is not removed. Here is the default header added by GAE.

302 218ms 0kb Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014) module=default version=1 107.178.194.96 - - [06/Feb/2016:19:57:04 -0800] "GET / HTTP/1.1" 302 383 "http://www.mywebbsite.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014)" "1.usedForIntercepting.appspot.com" ms=218 cpu_ms=224 cpm_usd=0.000043 loading_request=1 app_engine_release=1.9.32 trace_id=fd7b7420e7f8c23371a5b0ea7e9651 instance=00c61b117ce5ebac2a2eba44f26a01d4f2

This is what i have tried

for portal in self.searchPortals:
        spoofHeader = {
                       'User-agent':'Mozilla/5.0----------------------',
                       'Host':portal.getURL(),
                       'Accept-Encoding': 'identity',
                       'Connection': 'close',
                       'Accept': 'application/json, text/plain, */*',
                       'Origin': 'http://www.mywebsite.com'

                       }
        logging.info(spoofHeader)
        rpc = urlfetch.create_rpc(deadline=5)
        rpc.callback = lambda: self.handleCallBack(rpc, portal)
        #urlfetch.make_fetch_call(rpc, portal.getSearchURL(searchKeyword), headers={'User-agent':'Mozilla/5.0'})
        urlfetch.make_fetch_call(rpc, url='http://1.usedforintercepting.appspot.com', headers=spoofHeader)
        rpcs.append(rpc)

for rpc in rpcs:
    rpc.wait()

This is what i received.

2016-02-07 13:01:21.306 / 302 59ms 0kb Mozilla/5.0---------------------- AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014) module=default version=1 107.178.194.20 - - [06/Feb/2016:23:31:21 -0800] "GET / HTTP/1.1" 302 383 - "Mozilla/5.0---------------------- AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014)" "1.usedForIntercepting.appspot.com" ms=59 cpu_ms=6 cpm_usd=0.000043 app_engine_release=1.9.32 trace_id=a4a1f521c5a6fa65ed0295835dd175 instance=00c61b117ce5ebac2a2eba44f26a01d4f2

What i want is something like this.

GET http://somelink/search/abc HTTP/1.1 Accept-Encoding: identity Host: somelink.com Connection: close User-Agent: Mozilla/5.0

I want to remove everything form header other than User-Agent:Mozilla/5.0 ??

Note - for intercepting the request made from GAE using urlfetch i am using another instance of GAE.

1

1 Answers

3
votes

In the documentation, URL Fetch Python API Overview: Request Headers, it says

For security reasons, the following headers cannot be modified by the application:

  • Content-Length
  • Host
  • Vary
  • Via
  • X-Appengine-Inbound-Appid
  • X-Forwarded-For
  • X-ProxyUser-IP

It also says:

The following headers indicate the app ID of the requesting app:

User-Agent. This header can be modified but App Engine will append an identifier string to allow servers to identify App Engine requests. The appended string has the format "AppEngine-Google; (+http://code.google.com/appengine; appid: APPID)", where APPID is your app's identifier.

If you want custom headers, you will have to write your own urlfetch code or use an outside server that makes the call for you with your headers.