1
votes

I wrote the following Parser with the intent of fail-ing on whitespace:

import scala.util.parsing.combinator._

object Foo extends JavaTokenParsers { 
  val wsTest = not(whiteSpace) // uses whitespace inherited from `RegexParsers`
}

Why is parsing a bunch of whitespace successfull?

scala> Foo.parseAll(Foo.wsTest, "          ")
res5: Foo.ParseResult[Unit] = [1.11] parsed: ()

scala> res5.successful
res6: Boolean = true

Looking at Parsers#not from the project, I would've expected a Failure for my above test.

  /** Wrap a parser so that its failures and errors become success and
   *  vice versa -- it never consumes any input.
   */
  def not[T](p: => Parser[T]): Parser[Unit] = Parser { in =>
    p(in) match {
      case Success(_, _)  => Failure("Expected failure", in)
      case _              => Success((), in)
    }
  }
1
The not works correctly. My guess is that the parser skips white spaces by default and you have to disable that. Maybe this helps: stackoverflow.com/questions/3564094/… - Kigyo
My guess is that the parser skips white spaces by default - I've observed this behavior with a class extending JavaTokenParsers. However, I would not have expected Foo.parseAll(Foo.wsTest, " ") to have succeeded. - Kevin Meredith

1 Answers

4
votes

JavaTokenParsers extends RegexParsers, RegexParsers has:

 protected val whiteSpace = """\s+""".r

 def skipWhitespace = whiteSpace.toString.length > 0

 implicit def regex(r: Regex): Parser[String] = new Parser[String] {
    ... 
    val start = handleWhiteSpace(source, offset)
    ...
 }

 protected def handleWhiteSpace(source: java.lang.CharSequence, offset: Int): Int =
   if (skipWhitespace)
     (whiteSpace findPrefixMatchOf (source.subSequence(offset, source.length))) match {
       case Some(matched) => offset + matched.end
       case None => offset
     }
   else
     offset

so it skips whitespace (you can override this by overriding def skipWhitespace = false)

so for the parser " " equals ""

whitespace tries to match "" but it fails ("""\s+""" requires at least one whitespace) and the not converts this in a success