I'm writing an application which takes data input from a series of arbitrary RSS feeds. The feeds are polled asynchronously in the background and a method is called every time a new item is added to the feed.
My problem is identifying the new items in the feed. What's the best way to do it? I have come up with a few ideas, but they're all flawed.
Suggestion: Every time you poll, keep all items newer than the pubDate of the last item in the last poll Problem: pubDate is not a required field.
Suggestion: Keep a hash of the content for every item you return, and do not return content with the same hash Problem: Rapidly grows out of control in terms of memory usage