6
votes

I'm trying to parse an XML, but I want to filter and extract only a determinate number of children from a given node. For example:

<root>
    <node id="a" />
    <node id="b" />
    <node id="c" />
    <node id="d" />
</root>

And then if I execute the arrow getChildren >>> myFilter 2, I would get back only the nodes with id "a" and "b".

Intuition gives that I should use a State Arrow to keep track, but I don't know how to do that.

I tried to do it myself but it's not exactly what I want, doesn't look very elegant, and doesn't work. I try to run my chain of arrows with runSLA and a integer parameter as initial state, and then defining:

takeOnly :: IOSLA Int XmlTree XmlTree
takeOnly = changeState (\s b -> s-1)
             >>> accessState (\s b -> if s >= 0 then b else Nothing)

But of course I can't return Nothing, I need to return a XmlTree. But I don't want to return anything at all!

There's probably a better way out there. Can you help me?

Thanks for your time and help!

1

1 Answers

4
votes

It would probably be more idiomatic to use the combinators in Control.Arrow.ArrowList to handle this kind of thing.

The package specifically provides (>>.) :: a b c -> ([c] -> [d]) -> a b d, which is a "combinator for converting the result of a list arrow into another list". This allows us to use the take function that we already have for lists in this context.

Here's a quick version of how you might use it:

module Main where

import Text.XML.HXT.Arrow

takeOnly :: (ArrowXml a) => Int -> a XmlTree XmlTree
takeOnly n = getChildren >>. take n 

main = do
  let xml = "<root><node id='a' /><node id='b' />\
                  \<node id='c' /><node id='d' /></root>"

  print =<< runX (readString [] xml >>> getChildren >>> takeOnly 2)

This I believe does approximately what you're looking for:

travis@sidmouth% ./ArrowTake
[NTree (XTag (LP node) [NTree (XAttr (LP id)) [NTree (XText "a") []]]) [],
 NTree (XTag (LP node) [NTree (XAttr (LP id)) [NTree (XText "b") []]]) []]

No IOSLA required. Note that I've also changed the function type a little—this version seems nicer to me, but you could easily convert it to something more like the type in your version.