Flex and bison communicate semantic values through the semantic union yylval
, by default a global variable. (Note 1) If a token has a semantic value, the flex action which reports that token type must set the appropriate member of the semantic union to the token's value, and bison will extract the value and place it on the parser stack.
Bison relies on user declarations to tell which union member is used for the semantic value of tokens and non-terminals (if they have semantic values). So if you have the flex action:
{ID} {yylval.stringValue= strdup(yytext); return(ID);}
one would expect to see the following in the corresponding bison input file:
%union {
/* ... */
char* stringValue;
}
%token <stringValue> ID
The last line tells bison that ID
is a token type, and that its associated semantic type is the one with member name stringValue
. Subsequently, you can refer to the semantic value of the token and bison will automatically insert the member access, so that if you have the rule:
program: PROGRAM ID LBRACKET identifier_list RBRACKET
DELIM declarations subprogram_declarations compound_statement DOT
{ printf("%s\n", $2); /* Always use a format string in a printf! */ }
The $2
will be replaced with the equivalent of stack[frame_base + 2].stringValue
.
However, there is little point using an action like that in a bison file, since it is easy to use bison's trace facility to see how bison is processing the token stream. When traces are enabled, the token will be recorded when it is first seen by bison, as opposed to the above rule which won't print the ID token's semantic value until the entire program has been parsed.
By default, the trace facility only prints the token type, since Bison has no idea how to print an arbitrary semantic value. However, you can define printer rules for semantic types or for specific tokens or non-terminals. These rules should print the semantic value (without delimiters) to the output stream yyoutput
. In such a rule, $$
can be used to access the semantic value (and bison will fill in the member access, as above).
Here's a complete example of a simple language consisting only of function calls:
File printer.y
%{
#include <stdio.h>
#include <string.h>
int yylex(void);
%}
%defines
%define parse.trace
%union {
char* str;
long num;
}
%token <str> ID
%token <num> NUM
%type <str> call
/* Printer rules: token-specific, non-terminal-specific, and by type. */
%printer { fprintf(yyoutput, "%s", $$); } ID
%printer { fprintf(yyoutput, "%s()", $$); } call
%printer { fprintf(yyoutput, "%ld", $$); } <num>
/* Destructor rule: by semantic type */
%destructor { free($$); } <str>
%code provides {
void yyerror(const char* msg);
}
%%
prog: %empty
| prog stmt ';'
stmt: %empty
| call { free($1); /* See Note 2 */ }
call: ID '(' args ')' { $$ = $1; /* For clarity; this is the default action */ }
args: %empty
| arglist
arglist: value
| arglist ',' value
value: NUM
| ID { free($1); /* See Note 2 */ }
| call { free($1); /* ditto */ }
%%
int main(int argc, char** argv) {
if (argc > 1 && strcmp(argv[1], "-d") == 0) yydebug = 1;
return yyparse();
}
void yyerror(const char* msg) {
fprintf(stderr, "%s\n", msg);
}
File printer.l
%{
#include <stdlib.h>
#include "printer.tab.h"
%}
%option noinput nounput noyywrap nodefault
%%
[[:space:]]+ ;
[[:alpha:]_][[:alnum:]_]* { yylval.str = strdup(yytext); return ID; }
[[:digit:]]+ { yylval.num = strtol(yytext, NULL, 10); return NUM; }
. return *yytext;
To build:
bison -d -t -o printer.tab.c printer.y
flex -o printer.lex.c printer.l
gcc -Wall -ggdb -std=c11 -o printer printer.lex.c printer.tab.c
Notes:
The semantic type doesn't have to be a union, but it's very common. See the bison manual for other options.
The strdup
used to create the token must be matched with a free
somewhere. In this simple example, the semantic value of the ID
token (and the call
non-terminal) are only used for tracing, so they can be freed as soon as they are consumed by some other non-terminal. Bison does not invoke the destructor for tokens which are used by a parsing rule; it assumes that the programmer knows whether the token will or will not be needed. The destructor rules are used for tokens which Bison itself pops off the stack, typically in response to syntax errors.