2
votes

I'm trying to create simple Pascal compiler using Flex/Bison and I want to check what semantic values are stored withing tokens. I have following code for flex:

...
{ID}        {yylval.stringValue= strdup(yytext); return(ID);}
...

And following code in bison:

...
program: PROGRAM ID LBRACKET identifier_list RBRACKET DELIM declarations subprogram_declarations compound_statement DOT {printf($2);}
...

And following test file:

program example(input, output);
...

Flex and bison recognize all perfectly and parse is ok, but if I want check token values like in code before it has no effect:

Starting parse
Entering state 0
Reading a token: Next token is token PROGRAM ()
Shifting token PROGRAM ()
Entering state 1
Reading a token: Next token is token ID ()
Shifting token ID ()
Entering state 3

Is there a way to print token value inside (), like token ID (example). I've checked similar questions and they do it the same way, or maybe I'm just missing something.

P.S. When I enable debug mode for flex it shows that it accepted "example" by rule {ID}, but where does that example stored and how should I use it in advance.

2

2 Answers

0
votes

Bison cannot know for itself, where the semantic values shall be taken from. So you have to define %printers for your tokens. In your case you have to define the type of the token and a corresponding printer:

%token <stringValue> ID
%printer { fprintf(yyoutput, "%s", $$); } ID;

Define one printer for each token, which you want to deep-inspect in traces, then it should work as you expect.

2
votes

Flex and bison communicate semantic values through the semantic union yylval, by default a global variable. (Note 1) If a token has a semantic value, the flex action which reports that token type must set the appropriate member of the semantic union to the token's value, and bison will extract the value and place it on the parser stack.

Bison relies on user declarations to tell which union member is used for the semantic value of tokens and non-terminals (if they have semantic values). So if you have the flex action:

{ID}        {yylval.stringValue= strdup(yytext); return(ID);}

one would expect to see the following in the corresponding bison input file:

%union {
   /* ... */
   char* stringValue;
}
%token <stringValue> ID

The last line tells bison that ID is a token type, and that its associated semantic type is the one with member name stringValue. Subsequently, you can refer to the semantic value of the token and bison will automatically insert the member access, so that if you have the rule:

program: PROGRAM ID LBRACKET identifier_list RBRACKET
         DELIM declarations subprogram_declarations compound_statement DOT
         { printf("%s\n", $2); /* Always use a format string in a printf! */ }

The $2 will be replaced with the equivalent of stack[frame_base + 2].stringValue.

However, there is little point using an action like that in a bison file, since it is easy to use bison's trace facility to see how bison is processing the token stream. When traces are enabled, the token will be recorded when it is first seen by bison, as opposed to the above rule which won't print the ID token's semantic value until the entire program has been parsed.

By default, the trace facility only prints the token type, since Bison has no idea how to print an arbitrary semantic value. However, you can define printer rules for semantic types or for specific tokens or non-terminals. These rules should print the semantic value (without delimiters) to the output stream yyoutput. In such a rule, $$ can be used to access the semantic value (and bison will fill in the member access, as above).

Here's a complete example of a simple language consisting only of function calls:

File printer.y

%{
#include <stdio.h>
#include <string.h>
int yylex(void);
%}

%defines
%define parse.trace

%union {
  char* str;
  long  num;
}

%token <str> ID
%token <num> NUM
%type <str> call
  /* Printer rules: token-specific, non-terminal-specific, and by type. */
%printer { fprintf(yyoutput, "%s", $$); }   ID
%printer { fprintf(yyoutput, "%s()", $$); } call    
%printer { fprintf(yyoutput, "%ld", $$); }  <num>

  /* Destructor rule: by semantic type */
%destructor { free($$); } <str>

%code provides {
  void yyerror(const char* msg);
}

%%

prog: %empty
    | prog stmt ';'
stmt: %empty
    | call               { free($1);  /* See Note 2 */ }
call: ID '(' args ')'    { $$ = $1;   /* For clarity; this is the default action */ }
args: %empty
    | arglist
arglist: value
       | arglist ',' value
value: NUM
     | ID                { free($1);  /* See Note 2 */ }
     | call              { free($1);  /* ditto */      }

%%
int main(int argc, char** argv) {
  if (argc > 1 && strcmp(argv[1], "-d") == 0) yydebug = 1;
  return yyparse();
}
void yyerror(const char* msg) {
  fprintf(stderr, "%s\n", msg);
}

File printer.l

%{
#include <stdlib.h>
#include "printer.tab.h"
%}
%option noinput nounput noyywrap nodefault
%%
[[:space:]]+              ;
[[:alpha:]_][[:alnum:]_]* { yylval.str = strdup(yytext); return ID; }
[[:digit:]]+              { yylval.num = strtol(yytext, NULL, 10); return NUM; }
.                         return *yytext;

To build:

bison -d -t -o printer.tab.c printer.y
flex -o printer.lex.c printer.l
gcc -Wall -ggdb -std=c11 -o printer printer.lex.c printer.tab.c 

Notes:

  1. The semantic type doesn't have to be a union, but it's very common. See the bison manual for other options.

  2. The strdup used to create the token must be matched with a free somewhere. In this simple example, the semantic value of the ID token (and the call non-terminal) are only used for tracing, so they can be freed as soon as they are consumed by some other non-terminal. Bison does not invoke the destructor for tokens which are used by a parsing rule; it assumes that the programmer knows whether the token will or will not be needed. The destructor rules are used for tokens which Bison itself pops off the stack, typically in response to syntax errors.