4
votes

What is the best library to parse a feeds (RSS,Atom...) in Haskell?

I'm especially interested in the points:

  • Performance/memory
  • Encoding issues for non-English characters?
  • Correctness, detection of feed-type (RSS 1, RSS 2, Atom...), handling of non-valid feeds, etc.

I already stumbled upon feed, however it uses Strings. How can this affect performance/memory, especially if ByteString.Lazy or Text are used elsewhere throughout the app.

Any experiences on that?

1

1 Answers

4
votes

Your intuition is right about trying to avoid String. The general rule of thumb in modern Haskell is to avoid String whenever you can and use Text or ByteString instead. However in this case, I'm not aware of any direct drop-in replacement for the feed package.

In practice, because parsing feeds is usually network-bound, you shouldn't have any performance issues under normal circumstances.

However, if you really need high throughput and tight control of resources, it shouldn't be too difficult to write your own RSS parser using xml-conduit, which I'd say it's the most mature iteratee-based XML parsing library out there. You can have a look at how it's being used by these packages.