6
votes

Recently I've been studying parsers and grammars and how they work. I was reading over the formal grammar for JSON at http://www.ietf.org/rfc/rfc4627.txt, which uses EBNF. I was pretty confident in my understanding of BNF and EBNF, but apparently I still don't fully understand it. The RFC defines a JSON object like this:

  object = begin-object [ member *( value-separator member ) ]
  end-object

I understand that the intent here is to express that any JSON object can (optionally) have a member, and then be followed by 0 or more (value-separator, member) pairs. What I don't understand is why the asterisk appears before the (value-separator member). Isn't the asterisk supposed to mimic regex, so that it appears after the item to be repeated 0 or more times? Shouldn't the JSON object grammar be written like this:

  object = begin-object [ member ( value-separator member )* ]
  end-object
3

3 Answers

15
votes

In the mentioned document, http://www.ietf.org/rfc/rfc4627.txt, it is stated that

The grammatical rules in this document are to be interpreted as described in [RFC4234].

RFC4234 describes ABNF (Augmented BNF), not EBNF. If you look through this document, you will find the following definition:

3.6.  Variable Repetition:  *Rule

   The operator "*" preceding an element indicates repetition.  The full
   form is:

         <a>*<b>element

   where <a> and <b> are optional decimal values, indicating at least
   <a> and at most <b> occurrences of the element.

   Default values are 0 and infinity so that *<element> allows any
   number, including zero; 1*<element> requires at least one;
   3*3<element> allows exactly 3 and 1*2<element> allows one or two.

So, notation

*( value-separator member )

is correct according to ABNF definition, and allows any number of repetitions, including zero.

9
votes

Syntax is about the way somebody chooses to write down concrete entities to represent something.

I'll agree that puttting Kleene star before the entity to repeated is non-standard, and the authors choice to do that simply confuses people that are used to convention. But it is perfectly valid; the authors get to define what syntax means, and you, the user of the standard, just get to accept it.

There's some argument for putting the Kleene star where he did; it indicates that there is list following at a point where you might expect a list. The suffix-style Kleene star indicates the same, but it is sort of a surprise; first you read the list element (from left to right), then you discover the star.

As a practical matter, the surprise factor of post-Kleene-star isn't enough in general to outweigh the surprise factor of violating convention. But the authors of that standard made their choice.

Welcome to syntax.

1
votes

The nice thing about standards is that there are so many to choose from.

Apparently, Niklas Wirth was wondering the same thing as you thirty-some years ago:

The population of programming languages is steadily growing, and there is no end of this growth in sight. Many language definitions appear in journals, many are found in technical reports, and perhaps an even greater number remains confined to proprietory circles. After frequent exposure to these definitions, one cannot fail to notice the lack of “common denominators.” The only widely accepted fact is that the language structure is defined by a syntax. But even notation for syntactic description eludes any commonly agreed standard form, although the underlying ancestor is invariably the Backus-Naur Form of the Algol 60 report. As variations are often only slight, they become annoying for their very lack of an apparent motivation.

Yes, the notation used in RFC-4627 is less common, but not incomprehensible.