0
votes

I have some ambiguous input. What I want to do is to skip one of the alternatives if my predicate evaluates to false inside of the rule(I want to check if my chain hasn't a whitespace inside, but don't want to actually produce Whitespace tokens and inject it to every rule). I know I can catch exceptions with antlr, but it seems only for a global rule scope.
I guess I can try something with java code instead. For example, I have some java code which antlr4 produces:

            switch ( getInterpreter().adaptivePredict(_input,117,_ctx) ) {
            ...
            case 4:
                {
                _localctx = new ChainExpressionContext(_localctx);
                _ctx = _localctx;
                _prevctx = _localctx;
                setState(907);
                chain();
                }
                break;
            ...
            case 34:
                {
                _localctx = new FunctionExpressionContext(_localctx);
                _ctx = _localctx;
                _prevctx = _localctx;
                setState(953);
                functionCallNoParen();
                }
                break;
            }

What I want to do is something like this

boolean flag = true;
int _myalt = getInterpreter().adaptivePredict(_input,117,_ctx);
while (flag) {
            flag = false;
            switch ( _myalt ) {
            ...
            case 4:
                {
                _localctx = new ChainExpressionContext(_localctx);
                _ctx = _localctx;
                _prevctx = _localctx;
                setState(907);
                try {
                    chain();
                } catch (FailedPredicateException) {
                   if (**Also adaptivePredict of this rule reported ambiguity**) {
                       flag = true;
                       _myalt = 34;
                       continue;
                    }
                }
                break;
            ...
            }
}

Is it possible even(I mean may code like this break the whole antlr parsing somehow)? Or antlr have some better approaches for this like custom error handling?

EDIT

For example I have grammar

chain
    : chainBase memberAccess*
    ;

expression
    : ...                                  
    | chain                                                                  
    ...
    | functionCallNoParen                                                   
    ;

I would like to parse ambiguous phrases(For a parser with one channel, where HIDDEN tokens are ignored by default this input looks just identical)

put (123).abc
put(123).abc

differently depending on a whitespace characters inside(first is functionCallNoParen, second is chain), so I can try something like

chain
    : chainBase {!isCurrentTokenAWhitespace()}? memberAccess*
    ;

and here comes the described problem

1
This should be enoughSilverlight777

1 Answers

1
votes

I have a similar issue in my MySQL grammar, where a whitespace decides if I have to deal with a keyword followed by an open parenthesis or see a function call. For that I have a predicate that turns a keyword into a normal identifier, depending on the existance of one or more whitespaces (which is ultimately controlled by an SQL mode). In your case you could do that with your put keyword. Here's an example:

ADDDATE_SYMBOL: A D D D A T E { setType(determineFunction(ADDDATE_SYMBOL)); }; // MYSQL-FUNC

The single letter rules are just there to allow case independent keywords (e.g. A: 'A' | 'a';). You can see the full grammar here: https://github.com/mysql/mysql-workbench/blob/8.0/library/parsers/grammars/MySQLLexer.g4.

The function setType comes from the ANTLR4 runtime (here the lexer instance) and determineFunction is a member function in my custom lexer class that is defined as:

size_t MySQLBaseLexer::determineFunction(size_t proposed) {
  // Skip any whitespace character if the sql mode says they should be ignored,
  // before actually trying to match the open parenthesis.
  if (isSqlModeActive(IgnoreSpace)) {
    size_t input = _input->LA(1);
    while (input == ' ' || input == '\t' || input == '\r' || input == '\n') {
      getInterpreter<atn::LexerATNSimulator>()->consume(_input);
      channel = HIDDEN;
      type = MySQLLexer::WHITESPACE;
      input = _input->LA(1);
    }
  }

  return _input->LA(1) == '(' ? proposed : MySQLLexer::IDENTIFIER;
}