2
votes

I would like to parse the grammar definition from a Bison/YACC .y file. The syntax of the rules is very simple (and I can ignore everything outside the grammar rules section), and I don't need information about the semantic actions. However, even to skip the actions seems to require parsing the arbitrary C code fragments to determine where the {...} block ends (since you can have nested blocks etc.).

Is there a shortcut to this that doesn't require parsing C?

I guess one workaround would be to ask Bison itself to strip out all callbacks and just leave the grammar rules in the file which would then be trivial to parse.

2

2 Answers

3
votes

If you run bison with the -v flag, it will produce a file called basename.output which starts with the grammar (without actions). It's pretty easy to parse that report. (basename is the name of the input file, or of the output file if you specify the --output option, with the extension stripped off.)

The only other way to do it is to be prepared to duplicate most of bison's parsing, which involves at least lexing C, if not fully parsing it, as well as understanding how to parse all of bison's % commands.

Note: The grammar produced by the -v option has mid-rule actions transformed into non-terminals with empty right-hand-sides. The generated non-terminals have names of the form $@<number> or @<number>, so they're easy to identify.

2
votes

Recognizing and skipping over C code with braces is pretty trivial in flex:

%x cblk cstr cchr ccom cppcom
%%
                       int brace_depth;
{                      brace_depth=0; BEGIN(cblk);
<cblk>{                brace_depth++;
<cblk>}                if (!brace_depth--) BEGIN(INITIAL);
<cblk>\"               BEGIN(cstr);
<cblk>\'               BEGIN(cchr);
<cblk>\/\*             BEGIN(ccom);
<cblk>\/\/             BEGIN(ccpcom);
<cstr,cchr>\\.         ;
<cstr>\"               BEGIN(cblk);
<cchr>\'               BEGIN(cblk);
<ccom>\*\/             BEGIN(cblk);
<cppcom>\n             BEGIN(cblk);
<cblk,cchr,cstr,ccom,cppcom>.|\n   ;