11
votes

I am trying to parse nested lists in Agda. I searched on google and the closest I have found is parsing addressed in Haskell, but usually libraries like "parsec" are used that are not available in Agda.

So I would like to parse "((1,2,3),(4,5,6))" with a result type of (List (List Nat)).

And further nested lists should be supported (up to depth 5), e.g., depth 3 would be (List (List (List Nat))).

My code is very long and cumbersome, and it only works for (List (List Nat)) but not for further nested lists. I didn't make any progress on my own.

If helpful, I would like to reuse splitBy from the first answer of one of my older posts.

NesList : ℕ → Set
NesList 0 = ℕ -- this case is easy
NesList 1 = List ℕ -- this case is easy
NesList 2 = List (List ℕ) 
NesList 3 = List (List (List ℕ))
NesList 4 = List (List (List (List ℕ)))
NesList 5 = List (List (List (List (List ℕ)))) -- I am only interested to list depth 5
NesList _ = ℕ -- this is a hack, but I think okay for now


-- My implementation is *not* shown here
--
--
-- (it's about 80 lines long and uses 3 different functions
parseList2 : List Char → Maybe (List (List ℕ))
parseList2 _ = nothing -- dummy result


parseList : (dept : ℕ) → String → Maybe (NesList dept)
parseList 2 s = parseList2 (toList s)
parseList _ _ = nothing



-- Test Cases that are working (in my version)

p1 : parseList 2 "((1,2,3),(4,5,6))" ≡ just ((1 ∷ 2 ∷ 3 ∷ []) ∷ (4 ∷ 5 ∷ 6 ∷ []) ∷ [])
p1 = refl


p2 : parseList 2 "((1,2,3),(4,5,6),(7,8,9,10))" ≡ just ((1 ∷ 2 ∷ 3 ∷ []) ∷ (4 ∷ 5 ∷ 6 ∷ []) ∷ (7 ∷ 8 ∷ 9 ∷ 10 ∷ []) ∷ [])
p2 = refl

p3 : parseList 2 "((1),(2))" ≡ just ((1 ∷ []) ∷ (2 ∷ []) ∷ [])
p3 = refl

p4 : parseList 2 "((1,2))" ≡ just ((1 ∷ 2 ∷ []) ∷ [])
p4 = refl

-- Test Cases that are not working 
-- i.e., List (List (List Nat))

lp5 : parseList 3 "(((1,2),(3,4)),((5,6),(7,8)))" ≡ just (  ((1 ∷ 2 ∷ []) ∷ (3 ∷ 4 ∷ []) ∷ []) ∷ ((5 ∷ 6 ∷ []) ∷ (7 ∷ 8 ∷ []) ∷ []) ∷ [])
lp5 = refl

EDIT1 **

Connor's talk at ICFP is online -- the title is "Agda-curious?".
It is from two days ago. Check it out!!
.
See the video:
http://www.youtube.com/watch?v=XGyJ519RY6Y

--
EDIT2:
I found a link that seems to be almost the code I need for my parsing.
There is a tokenize function provided:
https://github.com/fkettelhoit/agda-prelude/blob/master/Examples/PrefixCalculator.agda

--
EDIT3:
I finally found a simple combinator library that should be fast enough. There are no examples included in the library so I still have to look how to solve the problem.
Here is the link:

https://github.com/crypto-agda/agda-nplib/blob/master/lib/Text/Parser.agda

There is more agda-code from Nicolas Pouillard online:
https://github.com/crypto-agda

3
My agda isn't so good, so this can probably be ignored, but shouldn't it be possible to have something like parseList' elementParser s to parse a list of items which are parsed by elementParser, and then use parseList 1 s = parseList' parseAnInt s, parseList n s = parseList' (parseList (n-1)) s. i.e. express the recursion naturally. (Similarly, couldn't NesList 0 = ℕ, NesList n = List (NesList (n-1)) be used.)huon
Any time you want to nest a functor within itself a variable number of times, use a free monad: haskellforall.com/2012/06/…Gabriella Gonzalez
The free monad approach is probably inappropriate to this problem. It hands the decision of how deeply nested the list^n should be to the producer of the data, when the OP seems to want the acceptable nestedness of the data to be under the control of the data consumer. The free monad on list is thus too big a type for what's specified. Junk in types is unfortunate and avoidable. This is a job for parser combinators. Nils Anders Danielsson has written a parser combinator library for Agda. It might even ship with the usual distribution.pigworker
@pigworker Thanks for you comment. I know the parser combinators from Nils. As far as I understood it's a academic project and last time I checkt (1-2 years ago), the combinators were extremely inefficient. (About 10 characters were the maximum.)mrsteve
I added a link to a "tokenize" function, perhaps this can help. Although I don't see the solution yet.mrsteve

3 Answers

2
votes

I don't have access to an agda implementation right now, so I can't check syntax, but this is how I would address it.

First, NesList can be simplified.

NesList 0 = ℕ
NesList (succ n) = List (NesList n)

Then you need a general-purpose list parsing function. Instead of Maybe you could use List to specify alternative parses. The return value is a successful parse and the remainder of the string.

Parser : Set -> Set
Parser a = List Char -> Maybe (Pair a (List Char))

This, given a parser routine for type x, parses a parenthesis-delineated comma-separated list of x.

parseGeneralList : { a : Set } Parser a -> Parser (List a)
parseGeneralList = ...implement me!...

This parses a general NesList.

parseNesList : (a : ℕ) -> Parser (NesList a)
parseNesList 0 = parseNat
parseNesList (succ n) = parseGeneralList (parseNesList n)

Edit: As was pointed out in the comments, code using this kind of Parser won't pass agda's termination checker. I'm thinking that if you want to do parser combinators you need a Stream based setup.

1
votes

I'm a bit late to the party but I am currently writing a total parser combinators library and I have a fairly compact solution re-using the neat NesList type suggested by @NovaDenizen.

I use difference lists but the basic ones would do too (we'd simply have to replace DList.toList with List.reverse because chainl1 aggregates values left to right).

NList : Set → ℕ → Set
NList A zero    = A
NList A (suc n) = List (NList A n)

NList′ : {A : Set} → [ Parser A ] →
         (n : ℕ) → [ Parser (NList A n) ]
NList′ A zero    = A
NList′ A (suc n) = parens $ return $ DList.toList <$>
                   chainl1 (DList.[_] <$> NList′ A n)
                           (return $ DList._++_ <$ char ',')

All the test cases pass successfully. I have added the example to the (monolithic) poc file so you can check for yourself

_ : "((1,2,3),(4,5,6))" ∈ NList′ decimal 2
_ = (1 ∷ 2 ∷ 3 ∷ []) ∷ (4 ∷ 5 ∷ 6 ∷ []) ∷ [] !

_ : "((1,2,3),(4,5,6),(7,8,9,10))" ∈ NList′ decimal 2
_ = (1 ∷ 2 ∷ 3 ∷ []) ∷ (4 ∷ 5 ∷ 6 ∷ []) ∷ (7 ∷ 8 ∷ 9 ∷ 10 ∷ []) ∷ [] !

_ : "((1),(2))" ∈ NList′ decimal 2
_ = (1 ∷ []) ∷ (2 ∷ []) ∷ [] !

_ : "((1,2))" ∈ NList′ decimal 2
_ = (1 ∷ 2 ∷ []) ∷ [] !

_ : "(((1,2),(3,4)),((5,6),(7,8)))" ∈ NList′ decimal 3
_ = ((1 ∷ 2 ∷ []) ∷ (3 ∷ 4 ∷ []) ∷ []) ∷
    ((5 ∷ 6 ∷ []) ∷ (7 ∷ 8 ∷ []) ∷ []) ∷ [] !
0
votes

I post here my solution using parser combinators. It uses the agda-nplib library is on github. The code is far from optimal but it works.

module NewParser where

-- dummy
open import Data.Maybe
open import Data.Bool

-- includes
open import Data.List hiding (map)

-- ***
-- WAS PRELUDE IMPORTS
open import StringHelpers using (charToℕ; stringToℕ)
open import Data.String hiding (_==_; _++_)
open import Data.Char
open import Function
open import Data.Nat
open import Data.Unit
open import Data.Maybe


-- https://github.com/crypto-agda/agda-nplib/tree/master/lib/Text
open import Text.Parser
open import ParserHelpers



--- ****
--- Lessons Learned, this is the key:
--- (was a basic error that tyeps where too specific, generalisation not possible)

-- parseList : {A : Set} → Parser (Maybe A) → Parser (Maybe A) → ℕ → Parser (List (Maybe A))
-- converted to
-- parseList : {A : Set} → Parser A → Parser A → ℕ → Parser (List A)



-- *****
-- General ... Normal List (depth 1)

parseList : {A : Set} → Parser A → Parser A → ℕ → Parser (List A)
parseList oneMatcher manyMatcher n = ⟪ _++_ · (map toL oneMatcher) · (many n manyMatcher) ⟫

parseBracketList : {A : Set} → Parser A → Parser A → ℕ → Parser (List A)
parseBracketList oneMatcher manyMatcher n = bracket '(' (parseList oneMatcher manyMatcher n) ')'

parseCommaListConvert : {A : Set} → (List Char → A) → (Parser (List Char)) → ℕ → Parser (List A)
parseCommaListConvert convert parser = parseBracketList (⟪ convert · parser ⟫) (⟪ convert · parseOne "," *> parser ⟫) 


-- For Numbers

number : Parser (List Char)
number = manyOneOf (toList "1234567890")

parseNumList : ℕ → Parser (List (Maybe ℕ))
parseNumList = parseCommaListConvert charsToℕ number


-- Nested List (depth 2)
--

parseListListNum : ℕ → Parser (List (List (Maybe ℕ)))
parseListListNum n = parseList (parseNumList n) ((parseOne ",") *> (parseNumList n)) n


parseManyLists : ℕ → Parser (List (List (Maybe ℕ)))
parseManyLists n = bracket '(' (parseListListNum n) ')'



-- Run the Parsers
--

open import MaybeEliminatorHelper

-- max number of terms is the number of characters in the string
-- this is for the termination checker
runParseList' : String → Maybe (List (Maybe ℕ))
runParseList' s = runParser (parseNumList (strLength s)) (toList s)

runParseList : String → Maybe (List ℕ)
runParseList = maybe-list-maybe-eliminate ∘ runParseList'

-- nested list
runParseNesList' : String → Maybe (List (List( Maybe ℕ)))
runParseNesList' s = runParser (parseManyLists (length (toList s))) (toList s)

runParseNesList : String → Maybe (List (List ℕ))
runParseNesList = maybe-list-list-maybe-eliminate ∘ runParseNesList' 

Here is are my helper functions:

module MaybeEliminatorHelper where

open import Data.Maybe
open import Category.Monad
open import Function

open import Data.List

open import Category.Functor


sequence-maybe : ∀ {a} {A : Set a} → List (Maybe A) → Maybe (List A)
sequence-maybe = sequence Data.Maybe.monad


join : {A : Set} → Maybe (Maybe A) → Maybe A
join m = m >>= id
  where 
    open RawMonad Data.Maybe.monad


maybe-list-elem : {A : Set} → Maybe (List (Maybe A)) → Maybe (List A)
maybe-list-elem mlm = join (sequence-maybe <$> mlm)
  where open RawFunctor functor

{-
sequence-maybe : [Maybe a] -> Maybe [a]

join :: Maybe (Maybe a) -> Maybe a


Maybe (List (List (Maybe A))
  Maybe.fmap (List.fmap sequenc-maybe)
Maybe (List (Maybe (List A))
  Maybe.fmap sequence-maybe
Maybe (Maybe (List (List A)))
  join
Maybe (List (List A))



join . Maybe.fmap sequence-maybe . Maybe.fmap (List.fmap sequenc-maybe)

join . Maybe.fmap (sequence-maybe . List.fmap sequenc-maybe)
(short form)

-}

maybe-list-elem2 : {A : Set} → Maybe (List (List (Maybe A))) → Maybe (List (List A)) 
maybe-list-elem2 = join ∘ Mfmap (sequence-maybe ∘ Lfmap sequence-maybe)
  where
    open RawMonad Data.Maybe.monad hiding (join) renaming (_<$>_ to Mfmap)
    open RawMonad Data.List.monad hiding (join) renaming (_<$>_ to Lfmap)


maybe-list-maybe-eliminate = maybe-list-elem

maybe-list-list-maybe-eliminate = maybe-list-elem2

Further helper functions:

-- ***
-- WAS PRELUDE IMPORTS
open import StringHelpers using (charToℕ; stringToℕ)

open import Data.String hiding (_==_)
open import Data.Char
open import Function
open import Data.Nat
open import Data.Unit
open import Data.Maybe



open import Text.Parser

open import Data.List


-- mini helpers
--

parseOne : String → Parser Char
parseOne = oneOf ∘ toList 

strLength : String → ℕ
strLength = length ∘ toList 


-- misc helpers
--

charsToℕ : List Char → Maybe ℕ
charsToℕ [] = nothing
charsToℕ xs = stringToℕ (fromList xs)

toL : ∀ {a} {A : Set a} → A → List A
toL x = x ∷ []


-- test
l : List (Maybe ℕ)
l = (just 3) ∷ (just 3) ∷ []


-- Parser Helpers Nicolas
--
isSpace : Char → Bool
isSpace = (_==_ ' ')

spaces : Parser ⊤
spaces = manySat isSpace *> pure _

-- decide if seperator before after brackets is spaces
someSpaces : Parser ⊤
someSpaces = someSat isSpace *> pure _

tok : Char → Parser ⊤
tok c = spaces *> char c *> pure _

bracket : ∀ {A} → Char → Parser A → Char → Parser A
bracket start p stop = tok start *> p <* tok stop

And some test cases:

tn09 : pList "12,13,,14" ≡ nothing
tn09 = refl

tn08 : pList "" ≡ nothing
tn08 = refl

tn07 : pList "12,13,14" ≡ nothing
tn07 = refl

-- not working tn06 : pList "(12,13,14,17)," ≡ nothing
-- not working tn06 = refl

tn05 : pList "aa,bb,cc" ≡ nothing
tn05 = refl

tn04 : pList "11" ≡ nothing 
tn04 = refl

tn03 : pList "(11,12,13)" ≡ just (11 ∷ 12 ∷ 13 ∷ [])
tn03 = refl


-- new testcases
tn11 : pList2 "((1,2,3),(4,5,6),(7,8,9))" ≡ just ((1 ∷ 2 ∷ 3 ∷ []) ∷ (4 ∷ 5 ∷ 6 ∷ []) ∷ (7 ∷ 8 ∷ 9 ∷ []) ∷ [])
tn11 = refl

-- old testcases
p1 : pList2 "((1,2,3),(4,5,6))" ≡ just ((1 ∷ 2 ∷ 3 ∷ []) ∷ (4 ∷ 5 ∷ 6 ∷ []) ∷ [])
p1 = refl


p2 : pList2 "((1,2,3),(4,5,6),(7,8,9,10))" ≡ just ((1 ∷ 2 ∷ 3 ∷ []) ∷ (4 ∷ 5 ∷ 6 ∷ []) ∷ (7 ∷ 8 ∷ 9 ∷ 10 ∷ []) ∷ [])
p2 = refl

p3 : pList2 "((1),(2))" ≡ just ((1 ∷ []) ∷ (2 ∷ []) ∷ [])
p3 = refl

p4 : pList2 "((1,2))" ≡ just ((1 ∷ 2 ∷ []) ∷ [])
p4 = refl

I am open for suggestions to improve the code.