2
votes

I'm wondering if there is a conventional name for the result of applying a single production rule in the course of parsing?

So say my grammar has this production:

attr -> NAME, EQUAL, TEXT

Then in the course of parsing I match that production and have a small collection of data that represents the substitution/match:

((NAME, 'foo'), (EQUAL, '='), (TEXT, 'bar'))

What would convention dictate that be called? The candidates I've collected so far are:

  • production -- I don't like this option because despite these items being what was "produced" by application of the rule, the 'attr -> NAME, EQUAL, TEXT' bit already lays legitimate claim to that term.
  • partial derivation -- I'd prefer a single word, but this makes sense to me as the entire parse result is a derivation and this is one "atom" or step of that derivation.
  • replacement -- I kind of like this one okay, but it seems a little generic.

Aside from an obsessive desire to fully articulate the solution domain, a function (perhaps match(rule, tokens)) that takes a production rule and returns the matching tokens (and/or resolved non-terminals) needs a name for what it returns (or at least its caller does).

The next step, in my case anyway, is to use these values to produce a node in the abstract syntax tree (AST), but that node is a separate object with potentially different form and additional fields.

Does someone with wider experience in parsing and compiler terminology know of a term that would be appropriate for this?

1
Are you perhaps thinking of something along the lines of the "reduce" step of shift-reduce parsers? - Damien_The_Unbeliever
@Damien_The_Unbeliever: that was my reaction, too. See my answer. - Ira Baxter
Actually, no, I don't think I've gotten that far in my parsing repetoire yet :) The parsing strategy I'm using is top-down, recursive descent. So for each nonterminal in the body of a candidate production I attempt a (recursive) match against each production rule for that nonterminal, in the order defined. The first one to match returns the result whose name cannot (yet) be mentioned :) I'm very keen on the shift-reduce strategy though, that will be next up on my learning journey once I've gotten this one sussed out. Apologies, I see now I ought to have mentioned :) - scanny
@scanny: my point was that the vocabulary from shift-reduce parsers apply ... because after you see the production, you reduce the right hand side logically to its nonterminal. Abstractly your top parser does something similar: it proposes ("calls") to parse a nonterminal (the LHS), parses the elemenst of the rules, and then is finished with the nonterminal ("returns"). That last step acts like a reduction: it is done with all the rule tokens. - Ira Baxter
Ah, now I get it, thanks for bearing with my being dense on that :) - scanny

1 Answers

3
votes

In bottom up parsing (e.g, LALR), the recognition of parts of one or more production rules is done efficiently by building a complex FSA in which states essentially encode what productions might match what has been seen up to a certain point in the input. The vocabulary:

  • the act of processing a token and making unit progress in matching part of (more than one) production, is called a shift (it is a transition in the FSA, plus recording in a parse stack that the transition has taken place)

  • after arriving (by shifting one or more times) at the apparent end of some production, the parser will execute one (or more) reductions, each of which signals the recognition of a production. Each reduction for a specific production rule R of length L, pops L entries from the parse stack, to determine the state of the FSA after the reduction. The parser then attempts to shift on a token for the left hand side of the production.

You appear to be interested in naming the status of being ready to do a reduction. There isn't a term I've seen used to name this status, but I would nominate reducible as being a reasonable term.

(What actually happens in an LALR parser in a reducible state, is that the next [as yet unshifted] input token, the lookahead, is checked against a [state-specific] lookahead set for the production, to see if one can really use the production in a reduction. [In effect, this is checking that the production is valid in the context in which it is found.] If one can, then a reduction takes place; if not, then the production does not actually apply and is ignored; there better be a valid shift on the lookahead token in this case).