3
votes

I'm learning to write a simple parser-combinator. I'm writing the rules from bottom up and write unit-tests to verify as I go. However, I'm blocked at using repsep() with whitespace as the separator.

object MyParser extends RegexParsers {
  lazy val listVal:Parser[List[String]]=elem('{')<~repsep("""\d+""".r,"""\s+""".r)~>elem('}')
}

The rule was simplified to illustrate the problem. When I feed the parser with "{1 2 3}", it always complains that it doesn't match:

[1.4] failure: `}' expected but 2 found

I'm wondering what's the correct way of writing a rule as I described?

Thanks

1

1 Answers

6
votes

By default, RegexParsers-derived parsers skip whitespace before attempting to match any terminal symbol. Unless your whitespace interpretation is unusual, you can just work with that. If the particular character (sequences) you wish to treat as ignored whitespace is something other than the default (\s+), you can override the projected val whiteSpace: Regex = ... value in your RegexParsers parser. If you do not what any such whitespace skipping to occur, override def skipWhitespace = false.

Edit: So yes, changing this:

repsep("""\d+""".r,"""\s+""".r)

to this:

rep("""\d+""".r)

and leaving everything else defined in RegexParsers unchanged should do what you want.

By the way, the common use of repsep is for things like comma-separated lists where you need to ensure the commas are there but don't need to keep them in the resulting parse tree (or AST).