1
votes

I am trying to read the RSS Feed of http://www.youm7.com/new3agelrss.asp using RSS feeds with Java - Tutorial for Lars Vogel.

I managed to read another rss feeds using the exact code without problems.

For this link I get

Server returned HTTP response code: 403 for URL: http://www.youm7.com/new3agelrss.asp

According to java.io.IOException: Server returned HTTP response code: 403 for URL I edited private InputStream read() method as follow:

private InputStream read() {
    try {

      HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
      httpcon.addRequestProperty("User-Agent", "Mozilla/4.76");
      return httpcon.getInputStream();
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }

Now I am getting:

Server returned HTTP response code: 503 for URL: http://www.youm7.com/new3agelrss.asp

Even that I can open that Url throw the web browser! and you can try it yourself. Please help. I am open to any alternative approach. Thanks in advance.

Note:

I installed two RSS reader application on my Mac. (RSS Notifier and RSS Bot) and both managed to read that RSS.

1

1 Answers

1
votes

Your code is probably fine for most sites. However this site looks like it is checking and running some javascript on the page before redirecting it (possibly to try and stop people from scraping it). So I'm not sure this is going to work, unless possibly you can set a cookie that they are setting, or emulate the response that they are expecting (I'm not sure if you have access to anything that could run the javascript to get around it as well).

Update: There is a bit of challenge/response in there looking at it, which throws a question (like some numbers that need adding up). I guess you could possibly look at scraping the original page, do the calculations and post a form back with the answer. I'm not sure I want to post a solution to this though, as it looks like the code is there to exactly try and stop this. Plus they could easily change the challenge question/format. So somehow running javascript would possibly be the best way if possible.