0
votes

I'm trying to implement a C grammar parser with lex & yacc and show the reducing procedure. I have to print token list on the left and shift or reduce rule on the right. Like:

  2*4+4/2 //iniput


2                shift 2                 2
2                reduce I -> F           2
2                reduce F -> T           2
2                reduce F -> T           2
*                shift *                 2  *
4                shift 4                 2  * 4
4                reduce I -> F           2  * 4
2 * 4            reduce T*F -> T         2  * 4

8                reduce T -> E           2  * 4

8                reduce T -> E           2  * 4
+                shift +                 2  * 4  +
4                shift 4                 2  * 4  + 4
4                reduce I -> F           2  * 4  + 4
4                reduce F -> T           2  * 4  + 4
4                reduce F -> T           2  * 4  + 4
/                shift /                 2  * 4  + 4  /
2                shift 2                 2  * 4  + 4  / 2
2                reduce I -> F           2  * 4  + 4  / 2
4 / 2            reduce T/F -> T         2  * 4  + 4  / 2

8 + 2            reduce E+T -> E         2  * 4  + 4  / 2


 end of parsing : 2  * 4  + 4  / 2  = 10

I am not familiar with lex & yacc, and I have no idea how to print the procedure out. Any help is welcome.

1
Each action run by the bison-generated parser corresponds to a 'reduce' in the underlying parser automaton, but there's no easy way of hooking the shifts.Chris Dodd
Are you sure you've understood your assignment? yacc -v already does this.user207421

1 Answers

1
votes

You can easily enough ask Bison to show you what it is doing. But it's not going to come out looking like your chart. You'll have to read through the trace and condense it into the desired format. But that's not too hard, and you will appreciate having learned how to do it the first time you have to debug a grammar.

I'm not going to explain here how to write a grammar, nor am I going to talk much about writing scanners. If you haven't done so already, I suggest you read through the simple examples in the bison manual, and then the chapter on semantic values. That will explain a lot of the background for the following.

Bison has some very useful tools for visualising the grammar and the parse. The first is the state/transition table produced when you give bison the --report=all command-line option.

You can use -v, which is what people will usually tell you to do. But I think --report=all is worthwhile for a novice because it comes closer to what you will have seen in class. The -v listing only shows the core items in each state, so it leaves out the items with the dot at the beginning. And it doesn't show you the lookaheads. Since it does show you all the action entries, including the GOTO actions, you can figure everything else out pretty easily. But, at least at the beginning, it's probably better to see all the details.

You can ask bison to draw the state machine. It produces the drawing in Graphviz ("Dot") syntax, so you need Graphviz installed to look at the drawing. And state machines for any non-trivial grammar don't fit on an A4 sheet, or a computer screen, so they're really only useful for toy grammars. Read the manual to see how to tell Bison to output the Graphviz diagram if you want to give it a try.

You'll probably want to refer to the state machine when you're trying to understand the traces.

You could write out parsing actions by just running the state machine by hand, using the actions which Bison shows you. But there's a lot to be said for reading the bison trace. And it's really not very difficult to produce. You just need to add one more command-line option when you invoke bison, and you need to add a few lines to your grammar source file. All of the information here, and a lot more, can be found in the bison manual chapter on grammar debugging

The option is -t or --debug. That tells Bison to generate the additional code to produce the traces. However, it does not enable tracing; you still have to do that by setting the value of the global variable yydebug to 1 (or some other non-zero value). Unfortunately, the variable yydebug is not defined unless the --debug option is specified, so if you just add yydebug = 1; to your main(), your program will no longer compile unless you run bison with the --debug option. That's annoying, so it's worth adding a few more lines to your code. The simplest few lines are these: (which can go just above your definition of main):

#if YYDEBUG
    extern int yydebug;
#else
    static int yydebug = 0;
#endif

That makes sure that yydebug is defined and usable in main regardless of whether you requested a debugging parser when you ran bison.

But that still doesn't enable traces. To do that, you need one more line (at least) which you can put right at the top of main:

yydebug = 1;

Or you could be a bit more sophisticated and make it possible to run the parser with or without traces, by checking the command-line arguments. A good way to parse command-line arguments is with getopt, but for a quick-and-dirty executable which only has one command-line argument, you could use the sample code below, which sets yydebug only if the executable is invoked with -d as its first command line argument.

This is probably pretty similar to the grammar you were given (or wrote), except that I used longer names for non-terminals.

    /*  FILE: simple_expr.l   */
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int yylex(void);
void yyerror(const char* msg);

%}

%token NUMBER
%printer { fprintf(yyo, "%d", $$); } NUMBER

%%

expr  : term
      | expr '+' term
      | expr '-' term
term  : factor
      | term '*' factor
      | term '/' factor
factor: NUMBER
      | '(' expr ')'

%%

#if YYDEBUG
extern int yydebug;
#else
static int yydebug = 0;
#endif

int main(int argc, char* argv[]) {
    if (argc > 1 && strcmp(argv[1], "-d") == 0) yydebug = 1;
    return yyparse();
}

void yyerror(const char* msg) {
    fprintf(stderr, "%s\n", msg);
}

We also need a lexical scanner. Here's a really simple one: (See the flex manual for any details you don't understand.)

    /*  FILE: simple_expr.l   */
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "simple_expr.tab.h"
%}

%option noinput nounput noyywrap nodefault

%%
[[:space:]]+        ;
[[:digit:]]+        { yylval = atoi(yytext); return NUMBER; }
.                   return yytext[0];

Compile (a Makefile would be useful here. Or whatever you use for building projects):

$ bison -o simple_expr.tab.c -d --debug --report=all simple_expr.y
$ flex -o simple_expr.lex.c simple_expr.l
$ gcc -Wall -o simple_expr simple_expr.tab.c simple_expr.lex.c

You should take a look at simple_expr.output at this point. There you will find the bison state machine report.

Now we run the program with traces enabled. (<<< is what Bash calls a "here-string". It takes a single argument and provides it to the executable as its standard input. This is really handy for debugging parsers.)

The trace is quite long, because, as I said, Bison makes no attempt to compress the information. Here's how it starts:

$ ./simple_expr -d <<< '2 * 3 + 12 / 4'
Starting parse
Entering state 0
Reading a token: Next token is token NUMBER (2)
Shifting token NUMBER (2)
Entering state 1
Reducing stack by rule 7 (line 22):
   $1 = token NUMBER (2)
-> $$ = nterm factor ()
Stack now 0
Entering state 5
Reducing stack by rule 4 (line 19):
   $1 = nterm factor ()
-> $$ = nterm term ()
Stack now 0
Entering state 4

So, it first shifts the token 2 (which is a NUMBER). (Note: I snuck a %printer declaration into the grammar file so that bison can print out the semantic value of NUMBER tokens. If I hadn't done that, it would just have told me that it read a NUMBER, leaving me to guess which NUMBER it read. So the %printer declarations are really handy. But you need to read the manual to see how to use them properly.)

The shift action causes it to go to state 1. Bison does immediate reductions when the default reduction doesn't depend on lookahead, so the parser now immediately reduces the stack using the rule factor: NUMBER. (You need either the state machine or the code listing with line numbers to see what "rule 7" is. That's one of the reasons we produced the report.)

After the reduction, the stack contains only state 0, which is the state consulted for the GOTO action (on the non-terminal factor, which was just reduced). That action takes us to state 5. Again, an immediate reduction is possible, using rule 4 (term: factor). After the reduction, the stack has again been reduced to just the start state, and the GOTO action takes us to state 4. At this point, another token is actually necesary. You can read the rest of the trace below; hopefully, you can see what's going on.

Reading a token: Next token is token '*' ()
Shifting token '*' ()
Entering state 10
Reading a token: Next token is token NUMBER (3)
Shifting token NUMBER (3)
Entering state 1
Reducing stack by rule 7 (line 22):
   $1 = token NUMBER (3)
-> $$ = nterm factor ()
Stack now 0 4 10
Entering state 15
Reducing stack by rule 5 (line 20):
   $1 = nterm term ()
   $2 = token '*' ()
   $3 = nterm factor ()
-> $$ = nterm term ()
Stack now 0
Entering state 4
Reading a token: Next token is token '+' ()
Reducing stack by rule 1 (line 16):
   $1 = nterm term ()
-> $$ = nterm expr ()
Stack now 0
Entering state 3
Next token is token '+' ()
Shifting token '+' ()
Entering state 8
Reading a token: Next token is token NUMBER (12)
Shifting token NUMBER (12)
Entering state 1
Reducing stack by rule 7 (line 22):
   $1 = token NUMBER (12)
-> $$ = nterm factor ()
Stack now 0 3 8
Entering state 5
Reducing stack by rule 4 (line 19):
   $1 = nterm factor ()
-> $$ = nterm term ()
Stack now 0 3 8
Entering state 13
Reading a token: Next token is token '/' ()
Shifting token '/' ()
Entering state 11
Reading a token: Next token is token NUMBER (4)
Shifting token NUMBER (4)
Entering state 1
Reducing stack by rule 7 (line 22):
   $1 = token NUMBER (4)
-> $$ = nterm factor ()
Stack now 0 3 8 13 11
Entering state 16
Reducing stack by rule 6 (line 21):
   $1 = nterm term ()
   $2 = token '/' ()
   $3 = nterm factor ()
-> $$ = nterm term ()
Stack now 0 3 8
Entering state 13
Reading a token: Now at end of input.
Reducing stack by rule 2 (line 17):
   $1 = nterm expr ()
   $2 = token '+' ()
   $3 = nterm term ()
-> $$ = nterm expr ()
Stack now 0
Entering state 3
Now at end of input.
Shifting token $end ()
Entering state 7
Stack now 0 3 7
Cleanup: popping token $end ()
Cleanup: popping nterm expr ()