3
votes

Part of an app I'm building needs to check RSS feeds for updates. I'm looking for a reliable way to know if a feed has new entries.

I know that sometimes people make posts to the future and, after that, posts to the present time which could cause some entries to be hidden. It seems like there could be more complications than that, as well. I also know that hashing the title or content would result in poor performance and unreliable results since those can change and are not a sign of new entries. And I know that a few years ago when I was maintaining a podcast RSS feed manually I never changed the item.

So, I need some way to reliably check RSS, Atom, etc feeds for new entries since they were lasted checked.

Specifically, this application will be written in Python for Google App Engine using Universal Feed Parser, but I doubt that matters too much in this case.

2

2 Answers

1
votes

You can use a conditional get by adding a if-modified-since header to your http request. Well behaved servers will return a 304 unmodified if there are no changes.

1
votes

Feed items have a unique id and/or a url that is likely to be unique. Hash only those together to get a quick and reasonable way to detect changes. But the only way to be absolutely sure would be to hash the content like you said.