0
votes

I have the following grammar:

IdentifierName ::
    IdentifierStart
    IdentifierName IdentifierPart

Which using the word git should be parsed into the following parse tree:

                 IdentifierName
                /              \
        IdentifierName IdentifierPart
       /              \         |
IdentifierName IdentifierPart  't'
       |                |
IdentiiferStart        'i'
       |
      'g'

I want to write a recursive descent algorithm to do that. Now I have two options either write a recursive descent parser with backtracking or a predictive recursive descent parser. These both are not table-drive parsers. However, I've read that for the recursive descent with backtracking I need to eliminate left-recursion. The grammar in the question seems to be left recursive.

So am I right that I either need to refactor grammar or use predictive algorithm?

2

2 Answers

3
votes

Yes, the grammar is left-recursive and thus not LL. Neither backtracking nor predictive LL-parsers can handle such a grammar. So you'd either need to change the grammar or use another algorithm such as an LR-parsing algorithm.

Note that this grammar is regular, so it can actually be translated to a regular expression or directly into a finite automaton.

When writing a real-world implementation of JavaScript the lexical rules such as this one would be handled in the lexer and only the other rules would be handled by the parser (however those a specified left-recursively as well, so they'd also have to be rewritten to be parsed by an LL-parser).

0
votes

This tends to be the work of the lexer, not the parser. Normally, a lexer proceeds one character at a time, in a loop, with a big switch statement (or the equivalent "initial character table" if data-driven.)

// near the end of the big "switch (ch) {" statement ...
default:
    if (!isIdentifierStart(chInit))
        {
        log(Severity.ERROR, ILLEGAL_CHAR, new Object[]{quotedChar(chInit)},
                lInitPos, source.getPosition());
        }
// fall through
case 'A':case 'B':case 'C':case 'D':case 'E':case 'F':case 'G':
case 'H':case 'I':case 'J':case 'K':case 'L':case 'M':case 'N':
case 'O':case 'P':case 'Q':case 'R':case 'S':case 'T':case 'U':
case 'V':case 'W':case 'X':case 'Y':case 'Z':
case 'a':case 'b':case 'c':case 'd':case 'e':case 'f':case 'g':
case 'h':case 'i':case 'j':case 'k':case 'l':case 'm':case 'n':
case 'o':case 'p':case 'q':case 'r':case 's':case 't':case 'u':
case 'v':case 'w':case 'x':case 'y':case 'z':
case '_':
    {
    while (source.hasNext())
        {
        if (!isIdentifierPart(nextChar()))
            {
            source.rewind();
            break;
            }
        }
    String name = source.toString(lInitPos, source.getPosition());
    // ...
    }

If building by hand, I find it far easier to have a dedicated lexer (producing tokens from a stream of chars) and parser (producing an AST from a stream of tokens) than to try to combine those into one parser.