3
votes

I have a grammar and everything works fine until this portion:

lexp
: factor ( ('+' | '-') factor)* 
;

factor :('-')? IDENT;

This of course introduces an ambiguity. For example a-a can be matched by either Factor - Factor or Factor -> - IDENT

I get the following warning stating this:

[18:49:39] warning(200): withoutWarningButIncomplete.g:57:31: 
Decision can match input such as "'-' {IDENT, '-'}" using multiple alternatives: 1, 2

How can I resolve this ambiguity? I just don't see a way around it. Is there some kind of option that I can use?

Here is the full grammar:

program 
    : includes decls (procedure)*
    ;

/* Check if correct! */
includes
    :  ('#include' STRING)* 
    ;

decls
    : (typedident ';')*
    ;

typedident
: ('int' | 'char') IDENT    
;

procedure
    : ('int' | 'char') IDENT '(' args ')' body
    ;

args
: typedident (',' typedident )*  /* Check if correct! */
|   /* epsilon */ 
;

body
: '{' decls stmtlist '}'
;   

stmtlist
: (stmt)*;

stmt  

:  '{' stmtlist '}'
| 'read' '(' IDENT ')' ';'
| 'output' '(' IDENT ')' ';'
| 'print' '(' STRING ')' ';'
| 'return' (lexp)* ';'
| 'readc' '(' IDENT ')' ';'
| 'outputc' '(' IDENT ')' ';'
|  IDENT '(' (IDENT ( ',' IDENT )*)? ')' ';'
| IDENT '=' lexp ';';



lexp
: term (( '+' | '-' ) term) * /*Add in | '-'  to reveal the warning! !*/
;

term 
    : factor (('*' | '/' | '%') factor )*
;  


factor : '(' lexp ')' 
 | ('-')? IDENT
 | NUMBER;




fragment DIGIT
: ('0' .. '9')
;

IDENT : ('A' .. 'Z' | 'a' .. 'z') (( 'A' .. 'Z' | 'a' .. 'z' | '0' .. '9' | '_'))* ;


NUMBER
: ( ('-')? DIGIT+)
;

CHARACTER
: '\'' ('a' .. 'z' | 'A' .. 'Z'  | '0' .. '9' | '\\n' |  '\\t'  | '\\\\'  | '\\'  | 'EOF'  |'.' | ',' |':'  )  '\'' /* IS THIS COMPLETE? */
;
1
Does the error occur with just this simple grammar? I'm not familiar with ANTLR, and I must be missing something obvious here, but this does not seem ambiguous to me. A minus immediately following a factor should always be interpreted as the binary operator here. A second minus, if present, will then be the optional unary minus. Parsing "factor - factor" as "factor" "-IDENT" would not be valid syntax according to the grammar since you can't transition from the first factor without consuming either + or -.500 - Internal Server Error
What do you mean by "add in | '-'"? Add it where? Can you post the ambiguous rule instead?Bart Kiers
I've removed that comment. As it is, these rules are ambigious. Here's a the Lexer rule if it helps: IDENT : ('A' .. 'Z' | 'a' .. 'z') (( 'A' .. 'Z' | 'a' .. 'z' | '0' .. '9' | '_'))* ;Awoken
Nope, a-a cannot be parsed in more than 1 way. At least, not with the rules you posted: these are not ambiguous. You may see some error or warning where these rules are a part of, but it's not those two. Try it yourself with just these rules. I suspect you have something like lexp+ elsewhere, which is the culprit.Bart Kiers
Yes, I had an error at | 'return' (lexp)* ';' ... thank you so much!Awoken

1 Answers

2
votes

As mentioned in the comments: these rules are not ambiguous:

lexp
 : factor (('+' | '-') factor)* 
 ;

factor : ('-')? IDENT;

This is the cause of the ambiguity:

'return' (lexp)* ';'

which can parse the input a-b in two different ways:

  1. a-b as a single binary expression
  2. a as a single expression, and -b as an unary expression

You will need to change your grammar. Perhaps add a comma in multiple return values? Something like this:

'return' (lexp (',' lexp)*)? ';'

which will match:

return;
return a;
return a, -b;
return a-b, c+d+e, f;
...