7
votes

I have written a grammar for vaguely Java-like DSL. While there are still some issues with it (it doesn't recognize all the inputs as I would want it to), what concerns me most is that the generated C code is not compilable.

I use AntlrWorks 1.5 with Antlr 3.5 (Antlr 4 apparently does not support C target).

The problem is with expression rules. I have rules prio14Expression to prio0Expression which handle operator precedence. To problem is at priority 2, which evaluates prefix and postfix operators:

...
prio3Expression: prio2Expression (('*' | '/' | '%') prio2Expression)*;

prio2Expression: ('++' | '--' | '!' | '+' | '-')* prio1Expression ('++' | '--')*;  

prio1Expression:
    prio0Expression (
        ('.' prio0Expression) |
        ('(' (expression (',' expression)*)? ')') |
        ('[' expression (',' expression)* ']')
    )*;

prio0Expression: 
    /*('(') => */('(' expression ')') |
    IDENTIFIER |
    //collectionLiteral |
    coordinateLiteral |
    'true' |
    'false' |
    NUMBER |
    STRING 
    ;
...

Expression is a label for prio14Expression. You can see the full grammar here.

The code generation itself is successful (without any errors or serious warnings). It generates following code:

CONSTRUCTEX();
EXCEPTION->type         = ANTLR3_MISMATCHED_SET_EXCEPTION;
EXCEPTION->name         = (void *)ANTLR3_MISMATCHED_SET_NAME;
EXCEPTION->expectingSet = &FOLLOW_set_in_prio2Expression962;

RECOVERFROMMISMATCHEDSET(&FOLLOW_set_in_prio2Expression962);
goto ruleprio2ExpressionEx;

Which does not build with error "Error 5 error C2065: 'FOLLOW_set_in_prio2Expression962' : undeclared identifier".

Did I do something wrong in the grammar? No other rules cause this error and if I somewhat reformulate the rule concerned, the generated code is valid (but then the grammar doesn't do what I want it to). What can I do to fix this issue?

Thanks for any help.

2
To me it looks like generation problem. There are several problems in the C target that can lead to compiler errors. Try reformulating your rule, like extracting the operators in an own rule and use explicit tokens for the string literals (i.e. use token definitions instead specifying tokens ad hoc like '++'). It also makes it simpler to parse the resulting AST (if needed).Mike Lischke
@MikeLischke I have tried many different variants, but I can't find a rule which would maintain functionality and compile.Matěj Zábský
Maybe off-topic, but did you try to generate C++ source from this grammar? (Yo need last ANTLR3 git checkout for this). C++ target is quite mature now, although it still does not support AST generationibre5041
@Ivan I did try C++ target (although not with last GIT version) and it did have some issues as well (I don't remember what were they exactly, I stopped investigating them when I found out AST is not supported).Matěj Zábský
Strange it works for me. Look at this link: github.com/ibre5041/antlr3/tree/t101/runtime/Cpp/tests . Look at test101. It compiles, but fails to parse your input file. BTW you can create AST even without AST support. Either by using rule actions, or by using rule return value.ibre5041

2 Answers

3
votes

I encountered same problem.

I think it happens if parser rule has a part of simple OR-ed token like this:

problem_case: problematic_rule;
problematic_rule: 'A' | 'B' ;

This doesn't happen if it is lexer rule.

workaround1: As_lexer_rule;
As_lexer_rule: 'A' | 'B' ;

Or, if it is complicated rule (not simple OR-ed token).

workaround2: make_it_complicated_needlessly;
make_it_complicated_needlessly: 'A' | 'B' | {false}? NeverUsedRule;
NeverUsedRule: /* don't care*/ ;

( I used semantic predicate "{false}?" for this modification. I believe it doesn't change the grammar of target language.)

3
votes

it seems to be an old post, but yet, maybe it's still useful for someone (as it was for me).

I encountered the same problem with the C runtime of antlr 3.5.

another easy workaround, that does not change the grammar:

problem_case: problematic_rule;
problematic_rule: a='A' | b='B' ;