2
votes

I'm trying to get my head around HXT, a Haskell library for parsing XML that uses arrows. For my specific use case I'd rather not use deep as there are cases where <outer_tag><payload_tag>value</payload_tag></outer_tag> is distinct from <outer_tag><inner_tag><payload_tag>value</payload_tag></inner_tag></outer_tag> but I ran into some weirdness that felt like it should work but doesn't.

I've managed to come up with a test case based on this example from the docs:

{-# LANGUAGE Arrows, NoMonomorphismRestriction #-}
module Main where

import Text.XML.HXT.Core

data Guest = Guest { firstName, lastName :: String }
  deriving (Show, Eq)


getGuest = deep (isElem >>> hasName "guest") >>> 
  proc x -> do
    fname <- getText <<< getChildren <<< deep (hasName "fname") -< x
    lname <- getText <<< getChildren <<< deep (hasName "lname") -< x
    returnA -< Guest { firstName = fname, lastName = lname }

getGuest' = deep (isElem >>> hasName "guest") >>> 
  proc x -> do
    fname <- getText <<< getChildren <<< (hasName "fname") <<< getChildren -< x
    lname <- getText <<< getChildren <<< (hasName "lname") <<< getChildren -< x
    returnA -< Guest { firstName = fname, lastName = lname }

getGuest'' = deep (isElem >>> hasName "guest") >>> getChildren >>>
  proc x -> do
    fname <- getText <<< getChildren <<< (hasName "fname") -< x
    lname <- getText <<< getChildren <<< (hasName "lname") -< x
    returnA -< Guest { firstName = fname, lastName = lname }


driver finalArrow = runX (readDocument [withValidate no] "guestbook.xml" >>> finalArrow)

main = do 
  guests <- driver getGuest
  print "getGuest"
  print guests

  guests' <- driver getGuest'
  print "getGuest'"
  print guests'

  guests'' <- driver getGuest''
  print "getGuest''"
  print guests''

Between getGuest and getGuest' I expand deep into the correct number of getChildren. The resulting function still works. I then factor the getChildren outside of the do block but this causes the resulting function to fail. The output is:

"getGuest"
[Guest {firstName = "John", lastName = "Steinbeck"},Guest {firstName = "Henry", lastName = "Ford"},Guest {firstName = "Andrew", lastName = "Carnegie"},Guest {firstName = "Anton", lastName = "Chekhov"},Guest {firstName = "George", lastName = "Washington"},Guest {firstName = "William", lastName = "Shakespeare"},Guest {firstName = "Nathaniel", lastName = "Hawthorne"}]
"getGuest'"
[Guest {firstName = "John", lastName = "Steinbeck"},Guest {firstName = "Henry", lastName = "Ford"},Guest {firstName = "Andrew", lastName = "Carnegie"},Guest {firstName = "Anton", lastName = "Chekhov"},Guest {firstName = "George", lastName = "Washington"},Guest {firstName = "William", lastName = "Shakespeare"},Guest {firstName = "Nathaniel", lastName = "Hawthorne"}]
"getGuest''"
[]

I feel like this should be a valid transformation to perform, but my understanding of arrows is a little shaky. Am I doing something wrong? Is this a bug that I should report?

I'm using HXT version 9.3.1.3 (the latest at the time of writing). ghc --version prints "The Glorious Glasgow Haskell Compilation System, version 7.4.1". I've also tested on a box with ghc 7.6.3 and got the same result.

The XML file had the following repetitive structure (the full file can be found here)

<guestbook>
  <guest>
    <fname>John</fname>
    <lname>Steinbeck</lname>
  </guest>
  <guest>
    <fname>Henry</fname>
    <lname>Ford</lname>
  </guest>
  <guest>
    <fname>Andrew</fname>
    <lname>Carnegie</lname>
  </guest>
</guestbook>
2
Could you post an example XML file to go with this?bheklilr
@bheklilr Okay, did that.Gareth Charnock

2 Answers

3
votes

In getGuest'' you have

... (hasName "fname") -< x
... (hasName "lname") -< x

That is, you are restricting to the case where x is "fname" and x is "lname", which isn't satisfied by any x!

2
votes

I've managed to work out the specific reason that the construction is interpreted the way it is. The following arrow translation found here provides a base to work from

addA :: Arrow a => a b Int -> a b Int -> a b Int
addA f g = proc x -> do
                y <- f -< x
                z <- g -< x
                returnA -< y + z

Becomes:

addA :: Arrow a => a b Int -> a b Int -> a b Int
addA f g = arr (\ x -> (x, x)) >>>
           first f >>> arr (\ (y, x) -> (x, y)) >>>
           first g >>> arr (\ (z, y) -> y + z)

From this we can, by analogy, derive:

getGuest''' = preproc >>>
           arr (\ x -> (x, x)) >>>
           first f >>> arr (\ (y, x) -> (x, y)) >>>
           first g >>> arr (\ (z, y) -> Guest {firstName = z, lastName = y})

    where preproc = deep (isElem >>> hasName "guest") >>> getChildren
        f = getText <<< getChildren <<< (hasName "fname")
        g = getText <<< getChildren <<< (hasName "lname")

In HXT, the arrows can be imagined as streams of values flowing through filters. arr (\x->(x,x)) does not "split the stream", as I'd hoped. Instead it creates a stream of tuples that are filtered by f and survivors are filtered by g. As f and g are mutually exclusive, there are no survivors.

Examples with getChildren inside miraculously worked because the tuple stream contained values from further up the XML document looking something like

<guest>
    <fname>John</fname>
    <lname>Steinbeck</lname>
</guest>

and so were not mutually exclusive.