2
votes

I have a YACC grammar for parsing expressions in C++. Here is lite version:

// yacc.y
%token IDENT

%%

expr:
    call_expr
    | expr '<' call_expr
    | expr '>' call_expr
    ;

call_expr:
    IDENT
    | '(' expr ')'
    | IDENT '<' args '>' '(' args ')'
    ;

args:
    IDENT
    | args ',' IDENT
    ;

%%

When I want to support function call with template arguments, I got a shift/reduce conflict.

When we got input IDENT '<' IDENT, yacc doesn't know whether we should shift or reduce.

I want IDENT '<' args '>' '(' args ')' got higher precedence level than expr '<' call_expr, so I can parse the following exprs.

x < y
f<x>(a,b)
f<x,y>(a,b) < g<x,y>(c,d)

I see C++/C# both support this syntax. Is there any way to solve this problem with yacc?

How do I modify the .y file?

Thank you!

1
The C++ grammar is not LALR(1), so to the best of my knowledge there is no way to parse C++ code with yacc. If you use bison and switch to GLR mode, you should be able to parse it.templatetypedef
@templatetypedef: Even with GLR you cannot resolve the conflict in f<x>(a) without knowing whether f is a template or not. All GLR can do is provide you with both possible parses (which will turn into an exponential number of parses for the full program).rici
@templatetypedef: if you had the semantic information available, you could parse it without GLR. You just restrict the template expansion production to TEMPLATE_ID '<' expr_list '>'. (That means you need to information available in your lexer, so the lexer and the parser need to share a symbol table. No big deal.) (And, yes, I'm leaving out lots of details.) (And you could do it the way you suggest with GLR, but I suspect it would be even more complicated.)rici
@curimit: No. You really cannot tell which is the correct parse unless you know what f is (at least, in my example).rici
@sepp2k: possibly true. That one could be achieved with GLR, but not easily with LALR(1). However, it will occasionally produce some unexpected results: if_both(x<1, y>(x+1)*(x+1), 42). Personally, I'd go with D, which doesn't make < ambiguous.rici

1 Answers

1
votes

You want the -v option to yacc/bison. It will give you a .output file with all the info about the generated shift/reduce parser. With your grammar, bison gives you:

State 1 conflicts: 1 shift/reduce
      :
state 1

    4 call_expr: IDENT .
    6          | IDENT . '<' args '>' '(' args ')'

    '<'  shift, and go to state 5

    '<'       [reduce using rule 4 (call_expr)]
    $default  reduce using rule 4 (call_expr)

which shows you where the problem is. After seeing an IDENT, when the next token is <, it doesn't know if it should reduce that call_expr (to ultimately match the rule expr: expr '<' call_expr) or if it should shift to match rule 6.

Parsing this with only 1 token lookahead is hard, as you have two distinct meanings of the token < (less-than or open angle bracket), and which is meant depends on the later tokens.

This case is actually even worse, as it is ambiguous, as an input like

a < b > ( c )

is probably a template call with both args lists being singletons, but it could also be

( a < b ) > ( c )

so just unfactoring the grammar won't help. Your best bet is to use a more powerful parsing method, like bison's %glr-parser option or btyacc