I have an application of Google App Engine(GAE) and I am using Python 2.7. This application receives an GET(ajax) request from user portal(say Chrome). Upon receiving the request, I prepare Asynchronous connections for requesting data from multiple websites(say X1, X2, etc) outside GAE using urlfetch.make_fetch_call() - GET request.
This worked fine for X1 website but not for X2. Started probing on local dev server. Upon probing I suspected that X2 is checking {'User-Agent':'Python-urllib/2.7'} tag in header. This is my best guess since changing this field to {'User-Agent': 'Mozilla/5.0'} returns the desired results.
So I uploaded the code to GAE and started the process with urlfetch.make_fetch_call(). Upon intercepting this call i found that no matter what i do, the default header added by GAE is not removed. Here is the default header added by GAE.
302 218ms 0kb Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014) module=default version=1 107.178.194.96 - - [06/Feb/2016:19:57:04 -0800] "GET / HTTP/1.1" 302 383 "http://www.mywebbsite.com/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014)" "1.usedForIntercepting.appspot.com" ms=218 cpu_ms=224 cpm_usd=0.000043 loading_request=1 app_engine_release=1.9.32 trace_id=fd7b7420e7f8c23371a5b0ea7e9651 instance=00c61b117ce5ebac2a2eba44f26a01d4f2
This is what i have tried
for portal in self.searchPortals:
spoofHeader = {
'User-agent':'Mozilla/5.0----------------------',
'Host':portal.getURL(),
'Accept-Encoding': 'identity',
'Connection': 'close',
'Accept': 'application/json, text/plain, */*',
'Origin': 'http://www.mywebsite.com'
}
logging.info(spoofHeader)
rpc = urlfetch.create_rpc(deadline=5)
rpc.callback = lambda: self.handleCallBack(rpc, portal)
#urlfetch.make_fetch_call(rpc, portal.getSearchURL(searchKeyword), headers={'User-agent':'Mozilla/5.0'})
urlfetch.make_fetch_call(rpc, url='http://1.usedforintercepting.appspot.com', headers=spoofHeader)
rpcs.append(rpc)
for rpc in rpcs:
rpc.wait()
This is what i received.
2016-02-07 13:01:21.306 / 302 59ms 0kb Mozilla/5.0---------------------- AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014) module=default version=1 107.178.194.20 - - [06/Feb/2016:23:31:21 -0800] "GET / HTTP/1.1" 302 383 - "Mozilla/5.0---------------------- AppEngine-Google; (+http://code.google.com/appengine; appid: s~xxx-etching-112014)" "1.usedForIntercepting.appspot.com" ms=59 cpu_ms=6 cpm_usd=0.000043 app_engine_release=1.9.32 trace_id=a4a1f521c5a6fa65ed0295835dd175 instance=00c61b117ce5ebac2a2eba44f26a01d4f2
What i want is something like this.
GET http://somelink/search/abc HTTP/1.1 Accept-Encoding: identity Host: somelink.com Connection: close User-Agent: Mozilla/5.0
I want to remove everything form header other than User-Agent:Mozilla/5.0 ??
Note - for intercepting the request made from GAE using urlfetch i am using another instance of GAE.