Crockfords Top Down Operator Precedence

Question

Out of interest, i want to learn how to write a parser for a simple language, to ultimately write an interpreter for my own little code-golfing language, as soon as i understood how such things work in general.

So I started reading Douglas Crockfords article Top Down Operator Precedence.

Note: You should probably read the article if you want a deeper understanding of the context of the code snippets below

I have trouble understanding how the var statement and the assignment operator = should work together.

D.C. defines an assignment operator like

var assignment = function (id) {
    return infixr(id, 10, function (left) {
        if (left.id !== "." && left.id !== "[" &&
                left.arity !== "name") {
            left.error("Bad lvalue.");
        }
        this.first = left;
        this.second = expression(9);
        this.assignment = true;
        this.arity = "binary";
        return this;
    });
};
assignment("=");

Note: [[value]] refers to a token, simplified to its value

Now if the expression function reaches e.g. [[t]],[[=]],[[2]],the result of [[=]].led is something like this.

{
    "arity": "binary",
    "value": "=",
    "assignment": true, //<-
    "first": {
        "arity": "name",
        "value": "t"
    },
    "second": {
        "arity": "literal",
        "value": "2"
    }
}

D.C. makes the assignment function because

we want it to do two extra bits of business: examine the left operand to make sure that it is a proper lvalue, and set an assignment member so that we can later quickly identify assignment statements.

Which makes sense to me up to the point where he introduces the var statement, which is defined as follows.

The var statement defines one or more variables in the current block. Each name can optionally be followed by = and an initializing expression.

stmt("var", function () {
    var a = [], n, t;
    while (true) {
        n = token;
        if (n.arity !== "name") {
            n.error("Expected a new variable name.");
        }
        scope.define(n);
        advance();
        if (token.id === "=") {
            t = token;
            advance("=");
            t.first = n;
            t.second = expression(0);
            t.arity = "binary";
            a.push(t);
        }
        if (token.id !== ",") {
            break;
        }
        advance(",");
    }
    advance(";");
    return a.length === 0 ? null : a.length === 1 ? a[0] : a;
});

Now if the parser reaches a set of tokens like [[var]],[[t]],[[=]],[[1]] the generated tree would look something like.

{
    "arity": "binary",
    "value": "=",
    "first": {
        "arity": "name",
        "value": "t"
    },
    "second": {
        "arity": "literal",
        "value": "1"
    }
}

The keypart of my question is the if (token.id === "=") {...} part.

I don't understand why we call

    t = token;
    advance("=");
    t.first = n;
    t.second = expression(0);
    t.arity = "binary";
    a.push(t);

rather than

    t = token;
    advance("=");
    t.led (n);
    a.push(t);

in the ... part.

which would call our [[=]] operators led function (the assignment function), which does

make sure that it is a proper lvalue, and set an assignment member so that we can later quickly identify assignment statements. e.g

{
    "arity": "binary",
    "value": "=",
    "assignment": true,
    "first": {
        "arity": "name",
        "value": "t"
    },
    "second": {
        "arity": "literal",
        "value": "1"
    }
}

since there is no operator with a lbp between 0 and 10, calling expression(0) vs. expression (9) makes no difference. (!(0<0) && !(9<0) && 0<10 && 9<10))

And the token.id === "=" condition prevents assignments to an object member as token.id would either be '[' or '.' and t.led wouldn't be called.

My question in short is:

Why do we not call the, optionally after a variable declaration followable, assignment operators' available led function. But instead manually set the first and second members of the statement but not the assignment member ?

Here are two fiddles parsing a simple string. Using the original code and one using the assignment operators led.

You might also get interesting responses to this over on the Computer Science StackExchange site: cs.stackexchange.com — glenatron
@glenatron Hey, thanks for the suggestion, interesting, i didn't even knew about the site. Could i just add a clone of the question to cs? Or would that be considered rude? — Moritz Roessler
I would do that, for sure - the type of answers on the different sites is fairly distinct and I suspect you will get some interesting angles there. — glenatron
@glenatron, for future reference: please don't suggest that people cross-post their question on other StackExchange sites (especially not just a day or two after posting here). Generally we encourage people to either flag the question for migration, or wait a while (a week?) for answers on the original site. It doesn't benefit anyone to have multiple copies of the same question floating around. — D.W.
@glenatron, The "avoid cross-posting, especially very soon after posting on one site" is a StackExchange policy. I'm not the source of it; I'm just reporting on it. That said, there are good reasons for the policy. This isn't the place to debate the policy (that'd be Meta.SO), but you should inform yourself about the reasons for the policy, and you should comply with it while it exists. Reference: meta.stackexchange.com/q/64068/160917, meta.cs.stackexchange.com/a/673/755. — D.W.

Benjamin Gruenbaum Benjamin Gruenbaum · Accepted Answer · 2013-09-23T10:56:16

When parsing a language, two things matter - Semantics and Syntax.

Semantically, var x=5; and var x;x=5 seem very close if not identical (Since in both cases first a variable is declared and then a value is assigned to that declared variable. This is what you've observed and is correct for the most part.

Syntactically however, the two differ (which is clearly visible).

In natural language, an analogue would be:

The boy has an apple.
There is an apple, the boy has it.

Now to be concise! Let's look at the two examples.

While the two (pretty much) mean the same thing, they are clearly not the same sentence. Back to JavaScript!

The first one: var x=5 is read the following way:

var                      x              =                  5
-----------------------VariableStatement--------------------
var -------------------        VariableDeclarationList 
var -------------------        VariableDeclaration
var            Identifier -------   Initialiser(opt)
var ------------------- x              = AssignmentExpression
var ------------------- x ------------ = LogicalORExpression
var ------------------- x ------------ = LogicalANDExpression
var ------------------- x ------------ = BitwiseORExpression
var ------------------- x ------------ = BitwiseXORExpression
var ------------------- x ------------ = BitwiseANDExpression 
var ------------------- x ------------ = EqualityExpression
var ------------------- x ------------ = ShiftExpression
var ------------------- x ------------ = AdditiveExpression
var ------------------- x ------------ = MultiplicativeExpression
var ------------------- x ------------ = UnaryExpression
var ------------------- x ------------ = PostfixExpression 
var ------------------- x ------------ = NewExpression
var ------------------- x ------------ = MemberExpression
var ------------------- x ------------ = PrimaryExpression
var ------------------- x ------------ = Literal
var ------------------- x ------------ = NumericLiteral
var ------------------- x ------------ = DecimalLiteral
var ------------------- x ------------ = DecimalDigit 
var ------------------- x ------------ = 5

Phew! All this had to happen syntactically to parse var x = 5 , sure, a lot of it is handling expressions - but it is what it is, let us check the other version.

This breaks into two statements. var x; x = 5 The first one is:

var                      x 
--------VariableStatement---
var ---- VariableDeclarationList 
var ---- VariableDeclaration
var                 Idenfifier (optional initializer not present)
var                      x

The second part is x=5 which is an assignment statement. I can go on with the same expression madness - but it's pretty much the same.

So in conclusion, while the two produce the same result semantically, syntactically as the official language grammar specifies - they are different. The result, in this case - is indeed the same.

Crockfords Top Down Operator Precedence

3 Answers