3
votes

I've been trying to make a parser for a (very) simple language that looks like this:

block{you are a cow too blkA{ but maybe not} and so is he} hear me moo blockZ{moooooo}

I can break it apart using regexes:

.*?[^ ]*?\\{
.*?\\}

which would essentially keep eating characters until it found something that matches [^ ]*?\\{ or \\}: the start or end of a block. My question is, if I want to do it using Scala's Parser Combinators, how do I do that? I currently have:

   def expr: Parser[Any] = (block | text)+
   def text = ".+?".r
   def block = "[^ ]*?\\{".r ~ expr ~ "}"

but this doesn't work:

parsed: List(b, l, o, c, k, {, y, o, u, a, r, e, a, c, o, w, t, o, o, b, l, k, A, {, b, u, t, m, a, y, b, e, n, o, t, }, a, n, d, s, o, i, s, h, e, }, h, e, a, r, m, e, m, o, o)

It seems that the block parser is not firing, and so the text parser is being fired repeatedly. but when i remove the text parser:

   def expr: Parser[Any] = (block)+

I get:

failure: string matching regex `[^ ]*?\{' expected but `y' found

block{you are a cow too blkA{ but maybe not} and so is he} hear me moo  
      ^

So obviously the block parser does work, except not when the text parser is present. What's happening? and is there a "proper" way of doing this, for so basic a grammar?

EDIT: Changed the title, since it's not so much about the reluctance anymore as just solving the problem

EDIT: I now have this:

def expr: Parser[Any] = (block | text)+

def text = "[^\\}]".r

def block = "[^ ]*?\\{".r ~ expr ~ "}"

The logic behind this is that for each character, it tests whether or not it is the start of a block. If it isn't, it moves on to the next character. This gives me:

parsed: List(((block{~List(y, o, u, a, r, e, a, c, o, w, t, o, o, ((blkA{~List(b, u, t, m, a, y, b, e, n, o, t))~}), a, n, d, s, o, i, s, h, e))~}), h, e, a, r, m, e, m, o, o)

which is kind of correct. It is parsing the non-block characters one-by-one though, which is probably a performance problem (i think?). Is there any way to parse all those non-block characters at once and leave them in one big string?

1
Please, don't add new questions to an existing one. Enhance, sure, but make new questions for new questions. It's parsing one-by-one because you used a non-greedy star. Just drop the non-greedyness. - Daniel C. Sobral

1 Answers

2
votes

The problem is that text is consuming all closing curly braces (}). It goes like this:

expr -> block -> expr -> text.+ (until all input is consumed)

At this point, it exits expr and tries to parse }, which does not exists, fails, and falls back to text on the first expr.

You can use log to see what's going on when you parse.