Pass more than a value between non-terminal Bison rule

Question

I'm trying to pass more than one value when i match a rule with my lexer (written in Flex).

{pattern_to_match}           {
                                 yylval.type_val.str=strdup(yytext);
                                 yylval.type_val.int=1;
                                 return TOKEN;
                             }

This is the lexer part

%union {
struct{
        char * str;
            int    int;
   }str_int;

%token <str_int> TOKEN

 TOKEN      {       
                printf("%s\n",$1.str_int.str);
                printf("%s\n",$1.str);
            }

and here we can see the Bison structure. I have written the two string into the printf as seen in a tutorial, but none works (both for the string and the int). What I'm doing wrong?

torek torek · Accepted Answer · 2013-08-31T20:05:14

Your %union directive looks ... well, "almost OK" in terms of what you've shown of it, but there's a missing close brace. I cannot say anything about the part you omitted, but int int is a syntax error, so I have to assume that this is not what's there either.

The code in the braces (both the flex and bison sections) does not match the fragment shown in the union.

Here's some correct syntax (I added more names for discussion purposes, and some other items to make the output compile-able with gcc -O -Wall -c):

%{
#include <stdio.h>
extern int yylex(void);
extern int yyerror(const char *);
%}

%union {
    struct named_for_discussion_below {
        char *pair_sval;
        int pair_ival;
    } pair;
    int single_ival;
}

%token <pair> TOKEN
%token <single_ival> INTEGER

%%

prog: exprlist;

exprlist: exprlist expr
        | /*empty*/
        ;

expr    : TOKEN { printf("got: %s %d\n", $1.pair_sval, $1.pair_ival); }
        | INTEGER { printf("got: %d\n", $1); }
        ;

Note that, because of the types supplied in the two %token directives, bison assumes that $1 is an instance of struct named_for_discussion_below, containing pair_sval and pair_ival, when the token is TOKEN, but that $1 is just a simple single_ival value when the token is INTEGER. You must select the structure member (.pair_sval and .pair_ival) when accessing the pair value, but you must omit the word pair. When accessing the single_ival you omit the word single_ival too; and since there's no .field sub-name, nothing else appears after $1.

Extended discussion

It may help, at least if you know the basics of how the generated parser works, to note here that each element of the parse stack is a union type. (Well, it is after using %union, otherwise it's just an ordinary int.)

The %union directive supplies the contents for this type. The internal name for this is union YYSTYPE, and it has a typedef-alias spelled YYSTYPE, which is what you (or flex) should use when setting up the auxiliary value for each token. Each call to yylex() must return an ordinary int value, which is the token number (0 for EOF, 1 through 255 for an ordinary char, and token values starting at or above 256 for tokens). (Byacc use #defines starting at 257, while modern bison uses an enum and starts at 258.) Each call also sets yylval and the value in yylval is pushed (shifted) onto its parse stack, along with the token. (Both bison and byacc use two parallel stacks, one for parser state and one for values, but that's an implementation detail you need not care about. Other than "Bob Corbett wrote the first versions of both" I'm not sure why they both work the same way here.)

When bison (or byacc) emits code, it uses the assigned or assumed type, from %token, %type, or angle-bracket-supplied name, to add a union element name as needed. For instance, suppose the yacc value stack is named S (it's not but just suppose), and suppose $1 is actually S[1], $2 being S[2], and so on. Without a %union directive and no explicit types, $n just translates directly to S[n]. When you introduce %union, though, it translates to S[n].field, where the field name comes from the implied or supplied type.

Thus, in the above, when handling an INTEGER which produces only a single_ival, bison/byacc generates what you need with no additional work on your part. But, when handling a TOKEN that produces a pair, S[1].pair is not sufficient to select one element of the struct. Adding .pair_sval selects the char * element of the struct.

The struct type's name, struct named_for_discussion_below, never appears in any automatically-generated code. If you want to pass a copy of the struct type, or a pointer to an instance of it, to some routine—e.g., alter(&$1), when $1 expands to S[1].pair—you will need to use the struct type's name. If you never do this, you can omit the name entirely.

Pass more than a value between non-terminal Bison rule

1 Answers

Extended discussion