0
votes

I use google reader to browse through various rss feeds. One of the things it does is say how many unread items there are. How does it keep track? I mean when I go view the source of any given rss feed it shows a finite list of say 20 items. If i dont check out a feed for a while I might have more unread items than are delivered in those 20 items.

How does it do it? Does google just use it's resources and check the feeds frequently and store the items? Is there a way to page through rss feeds?

1

1 Answers

0
votes

RSS is just an XML file format. To remember what you've read before you'll need to store that information locally.

Some RSS providers have an API so you can request rss documents with a particular set of parameters, but if you're simply checking an RSS for updates you'll have to remember what you've seen before (likely by storing the last retrieved one and comparing them).

In short, yes, Google probably stores rss histories to provide you with a history beyond what the current feed shows. And it probably polls them pretty frequently. There are meta information tags that site owners can put in, to tell the Google bot how often to come back to check for updates.

If the updates happen very quickly and are pushed out of the shown RSS before your crawler gets there, then you are out of luck unless the provider is offering some other way of retrieving the information.