21
votes

I'm trying to wrap my head around PEG by entering simple grammars into the PEG.js playground.

Example 1:

  • Input: "abcdef1234567ghijklmn8901opqrs"
  • Desired output: ["abcdef", "1234567", "ghijklmn", "8901", "opqrs"]

  • Actual output: ["abcdef", ["1234567", ["ghijklmn", ["8901", ["opqrs", ""]]]]]

This example pretty much works, but can I get PEG.js to not nest the resulting array to a million levels? I assume the trick is to use concat() instead of join() somewhere, but I can't find the spot.

start
  = Text

Text
  = Numbers Text
  / Characters Text
  / EOF

Numbers
  = numbers: [0-9]+ {return numbers.join("")}

Characters
  = text: [a-z]+ {return text.join("")}

EOF
  = !.

Example 2:

Same problem and code as Example 1, but change the Characters rule to the following, which I expected would produce the same result.

Characters
  = text: (!Numbers .)+ {return text.join("")}

The resulting output is:

[",a,b,c,d,e,f", ["1234567", [",g,h,i,j,k,l,m,n", ["8901", [",o,p,q,r,s", ""]]]]]

Why do I get all these empty matches?

Example 3:

Last question. This doesn't work at all. How can I make it work? And for bonus points, any pointers on efficiency? For example, should I avoid recursion if possible?

I'd also appreciate a link to a good PEG tutorial. I've read (http://www.codeproject.com/KB/recipes/grammar_support_1.aspx), but as you can see I need more help ...

  • Input: 'abcdefghijklmnop"qrstuvwxyz"abcdefg'
  • Desired output: ["abcdefghijklmnop", "qrstuvwxyz", "abcdefg"]
  • Actual output: "abcdefghijklmnop\"qrstuvwxyz\"abcdefg"
start
  = Words

Words
  = Quote
  / Text
  / EOF

Quote
  = quote: ('"' .* '"') Words {return quote.join("")}

Text
  = text: (!Quote . Words) {return text.join("")}

EOF
  = !.
3
FYI: This question refers to a very old version of pegjs. Newer versions have an entirely different syntax supporting lots of useful features like text() which references the complete text string that as been matched, as in: digits = [0-9]+ { return text() }Rob Raisch

3 Answers

21
votes

I received a reply in the PEG.js Google Group that helped me onto the right track. I'm posting answers to all three problems in the hope that they can serve as a rudimentary tutorial for other PEG beginners like myself. Notice that no recursion is needed.

Example 1:

This is straightforward once you understand basic PEG idioms.

start
  = Text+

Text
  = Numbers
  / Characters

Numbers
  = numbers: [0-9]+ {return numbers.join("")}

Characters
  = text: [a-z]+ {return text.join("")}

Example 2:

The problem here is a peculiar design choice in the PEG.js parser generator for Peek expressions (&expr and !expr). Both peek ahead into the input stream without consuming any characters, so I incorrectly assumed that they didn't return anything. However, they both return an empty string. I hope the author of PEG.js changes this behavior, because (as far as I can tell) this is just unnecessary cruft that pollutes the output stream. Please correct me if I'm wrong about this!

Anyway, here is a workaround:

start
  = Text+

Text
  = Numbers
  / Words

Numbers
  = numbers: [0-9]+ {return numbers.join("")}

Words
  = text: Letter+ {return text.join("")}

Letter
  = !Numbers text: . {return text}

Example 3:

The problem is that an expression like ('"' .* '"') can never succeed. PEG is always greedy, so .* will consume the rest of the input stream and never see the second quote. Here is a solution (that incidentally needs the same Peek workaround as in Example 2).

start
  = Words+

Words
  = QuotedString
  / Text

QuotedString
  = '"' quote: NotQuote* '"' {return quote.join("")}

NotQuote
  = !'"' char: . {return char}

Text
  = text: NotQuote+ {return text.join("")}
1
votes

For current versions of pegjs, you might try:

Example One

Input: "abcdef1234567ghijklmn8901opqrs"

Desired output: ["abcdef", "1234567", "ghijklmn", "8901", "opqrs"]

{
  /**
   * Deeply flatten an array.
   * @param  {Array} arr - array to flatten
   * @return {Array} - flattened array
   */
  const flatten = (arr) =>  Array.isArray(arr) ? arr.reduce((flat, elt) => flat.concat(Array.isArray(elt) ? flatten(elt) : elt), []) : arr
}

start = result:string {
  console.log(JSON.stringify(result))
  return result
}

string = head:chars tail:( digits chars? )* {
  return flatten([head,tail])
}

chars = [a-z]+ {
  return text()
}

digits = $[0-9]+ {
  return text()
}

Example 2

Should be easy to deduce from the answer above.

Example 3

Input: 'abcdefghijklmnop"qrstuvwxyz"abcdefg'

Desired output: ["abcdefghijklmnop", "qrstuvwxyz", "abcdefg"]

{
  /**
   * Deeply flatten an array.
   * @param  {Array} arr - array to flatten
   * @return {Array} - flattened array
   */
  const flatten = (arr) =>  Array.isArray(arr) ? arr.reduce((flat, elt) => flat.concat(Array.isArray(elt) ? flatten(elt) : elt), []) : arr
}

start = result:string {
  console.log(JSON.stringify(result))
  return result
}

string = head:chars tail:quote_chars* {
  return flatten([head,tail])
}

quote_chars = DQUOTE chars:chars {
  return chars
}

chars = [a-z]+ {
  return text()
}

DQUOTE = '"'
-2
votes

Example 1

start
  = alnums

alnums
  = alnums:(alphas / numbers) {
    return alnums;
  }

alphas
  = alphas:$(alpha+)

numbers
  = numbers:$(number+)

number
  = [0-9]

alpha
  = [a-zA-Z]

Example 2

ignore

Example 3

> 'abcdefghijklmnop"qrstuvwxyz"abcdefg'.split('"')
[ 'abcdefghijklmnop',
  'qrstuvwxyz',
  'abcdefg' ]