In general, a website that offers RSS feed(s) indicates so in the header of at least the home page, some every single page.
There is an example of an RSS feed:
<link href="http://snapwebsites.org/rss.xml"
title="Snap! A C++ Open Source CMS RSS"
type="application/rss+xml"
rel="alternate">
Note that the type will vary slightly between websites. For example some websites may use text
instead of application
(which is wrong, but XML is text...) There is also application/atom+xml
. You may also have both formats.
If that's not available, then you'd have to check the home page or other pages for anchor links to an RSS feed, which means:
- Parse the HTML
- Look for anchors
- Read the
href
attribute
- Check the destination to see whether it returns an XML file
- If you get an xml file (starts with
<?xml ...
) then check the root tag:
- 'rss' -- RSS format (version is an attribute)
- 'feed' -- Atom format
I have an example on the following page that includes the <link ...>
tag in the header:
http://snapwebsites.org/implementation/feature-requirements/feed-feature-core-atom-rss-20-etc
I have to say, without that link, it will be quite a bit harder to find the RSS feeds. That being said, on many websites the feeds files make use of an extension (.rss, .atom, .xml) and that could be used to simplified the search. Yet, more and more, feeds look like directory names (.../blah
or .../foo
cannot be distinguished from a standard HTML page or a feed, so the only way is to read the file at the destination and check the file format; the Content-Type
of the HTTP reply should be application/rss+xml
or application/atom+xml
too... like the header link type=...
attribute)
As a side note, although very unlikely (I've not really seen it on a live website), it is possible to use the Link: ...
HTTP header to indicate... links just the same as the <link ...>
tag found in the HTML header. If you have access to the HTTP header (here is how to do it in PHP), then it's worth looking for those headers to see whether one of them is an RSS feed.