1
votes

I'm trying figure out how to write a Haskell Parsec parser for consuming any of these Ruby expressions:

   hello("test", 'test2') 
   my_variable
   hello(world("test"))
   (hello + " " + world)

When the parser starts parsing at the beginning of any of these items, it should return the whole string and stop parsing at the end of item. If any of these items is followed by a comma, that comma should not be consumed.

I've tried a few times to write a parser for these types of expressions but with no success. It's not necessary to parse the sub-components of these expressions -- I don't need a full AST. I just need to consume and capture these sorts of chunks.

I thought maybe an adequate heuristic could involve just balancing any parentheses and eating all the content within outer balanced parentheses, in addition to any preceding identifier. But I need some help writing a parser that works this way.

1
It doesn't make sense to try to parse without parsing everything. Either (a) write a structured, correct parser, or (b) write something that eats the input, does some counting and tracking but doesn't actually parse it. You'll find it hard to do (b) with parsec. The key question is correctness: how will you parse this(example + "(with" + (weird ("bracketing)?")+"(")) unless you parse strings? You should bite the bullet and write a string parser first, then an identifier parser, then mutually recursive expression, argumentList and function parsers. You don't have to return an AST.AndrewC
That makes sense. Thanks for the advice.dan
@AndrewC since your comment appeared to answer this, do you mind posting it as an answer to move this off "unanswered questions"?sclv
@sclv Done. I'm not sure it's a stupendous answer, but I completely take the point about the list.AndrewC

1 Answers

1
votes

It doesn't make sense to try to parse without parsing everything. Either (a) write a structured, correct parser, or (b) write something that eats the input, does some counting and tracking but doesn't actually parse it. You'll find it hard to do (b) with parsec. The key question is correctness: how will you parse this(example + "(with" + (weird ("bracketing)?")+"(")) unless you parse strings? You should bite the bullet and write a string parser first, then an identifier parser, then mutually recursive expression, argumentList and function parsers. You don't have to return an AST.