9
votes

I'd like to implement this grammar rule using Haskell's parsec library:

((a | b | c)* (a | b))?

Which is a parser rule that accepts an optional (i.e. potentially empty) string. If the string it acccepts is not empty, then it can be consumed by passing through zero or more occurrences of the a b or c parsers, but the accepted string by the outer most ? optional parser must be consumed either by parser a or b, but not c. Here's an example:

module Main where

import Text.Parsec
import Text.Parsec.Text

a,b,c :: GenParser () Char
a = char 'a'
b = char 'b'
c = char 'c'

-- ((a | b | c)* (a | b))?
myParser = undefined

shouldParse1,shouldParse2,shouldParse3,
      shouldParse4,shouldFail :: Either ParseError String
-- these should succeed
shouldParse1 = runParser myParser () "" "" -- because ? optional
shouldParse2 = runParser myParser () ""  "b"
shouldParse3 = runParser myParser () "" "ccccccb"
shouldParse4 = runParser myParser () "" "aabccab"

-- this should fail because it ends with a 'c'
shouldFail = runParser myParser () "" "aabccac"

main = do
  print shouldParse1
  print shouldParse2
  print shouldParse3
  print shouldParse4
  print shouldFail

A first attempt might look like this:

myParser = option "" $ do
  str <- many (a <|> b <|> c)
  ch  <- a <|> b
  return (str ++ [ch])

But the many just consumes all 'a' 'b' and 'c' characters in each test case, leaving a <|> b no characters to consume.

The question:

Using parsec combinators, what is the correct implementation of ((a | b | c)* (a | b))? to define myParser?

1
Maybe parse (a|b|c)+ and reject it later if it ends with c? - Doug McClean

1 Answers

4
votes

We can also state this slightly different: c in your parser may only succeed if it's followed by any token, which can be done with a single lookAhead:

myParser = many (a <|> b <|> (c <* (lookAhead anyToken <?> "non C token"))) <* eof