4
votes

I want to get html code from windows phone market pages. So far I have not run into any problems but today following error is displayed every time I retrieve data.

[...] Your request appears to be from an automated process. If this is incorrect, notify us by clicking here to be redirected [...].

I tried to use proxy in case to many requests are called from one IP but this does not bring any progression. Do you happen to know why this problem takes place, any ideas about possible way outs? Any help would be very much appreciated. The main goal is to somehow get information about windows phone app from market.

1
What technique do you use to get the html? Which command, language, way? Apparently this restriction is active since 2014-07-09. (That's the time, my php script (file_get_contents) stopped working. Curl gets blocked as well.malte
I use C#. I have been messing aroung with my application for last month but today after a couple calls to retrieve html code I received that warning.Maximus
Maybe it's possible to trick the page and load it into a webView. From there you should be able to invoke script document.documentElement.outerHtml and get the contents for your application. Just an idea coming up my mind ...malte
What possibly can they block if not an IP. I am writing ASP.NET application so WebView is not accessable.Maximus

1 Answers

4
votes

It seems that they detect the user agent and block the request if it is not valid / known for a device. I managed to make it work with curl with eg. curl -A 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9' http://www.windowsphone.com/en-us/store/app/pinpoint-by-foundbite/ff9fdf41-aabd-4cac-9086-8710bd327da9

For asp.net, if you use HttpRequest to get the html content, try the following:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);

request.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9";

For PHP you can set your user agent as well via curl_setopt.

I was not able to find out, whether there is an IP-based block after several requests.