Your %union
directive looks ... well, "almost OK" in terms of what you've shown of it, but there's a missing close brace. I cannot say anything about the part you omitted, but int int
is a syntax error, so I have to assume that this is not what's there either.
The code in the braces (both the flex and bison sections) does not match the fragment shown in the union.
Here's some correct syntax (I added more names for discussion purposes, and some other items to make the output compile-able with gcc -O -Wall -c
):
%{
#include <stdio.h>
extern int yylex(void);
extern int yyerror(const char *);
%}
%union {
struct named_for_discussion_below {
char *pair_sval;
int pair_ival;
} pair;
int single_ival;
}
%token <pair> TOKEN
%token <single_ival> INTEGER
%%
prog: exprlist;
exprlist: exprlist expr
| /*empty*/
;
expr : TOKEN { printf("got: %s %d\n", $1.pair_sval, $1.pair_ival); }
| INTEGER { printf("got: %d\n", $1); }
;
Note that, because of the types supplied in the two %token
directives, bison assumes that $1
is an instance of struct named_for_discussion_below
, containing pair_sval
and pair_ival
, when the token is TOKEN
, but that $1
is just a simple single_ival
value when the token is INTEGER
. You must select the structure member (.pair_sval
and .pair_ival
) when accessing the pair
value, but you must omit the word pair
. When accessing the single_ival
you omit the word single_ival
too; and since there's no .field
sub-name, nothing else appears after $1
.
Extended discussion
It may help, at least if you know the basics of how the generated parser works, to note here that each element of the parse stack is a union
type. (Well, it is after using %union
, otherwise it's just an ordinary int
.)
The %union
directive supplies the contents for this type. The internal name for this is union YYSTYPE
, and it has a typedef-alias spelled YYSTYPE
, which is what you (or flex) should use when setting up the auxiliary value for each token. Each call to yylex()
must return an ordinary int
value, which is the token number (0 for EOF, 1 through 255 for an ordinary char
, and token values starting at or above 256 for tokens). (Byacc use #define
s starting at 257, while modern bison uses an enum
and starts at 258.) Each call also sets yylval
and the value in yylval
is pushed (shifted) onto its parse stack, along with the token. (Both bison and byacc use two parallel stacks, one for parser state and one for values, but that's an implementation detail you need not care about. Other than "Bob Corbett wrote the first versions of both" I'm not sure why they both work the same way here.)
When bison (or byacc) emits code, it uses the assigned or assumed type, from %token
, %type
, or angle-bracket-supplied name, to add a union element name as needed. For instance, suppose the yacc value stack is named S
(it's not but just suppose), and suppose $1
is actually S[1]
, $2
being S[2]
, and so on. Without a %union
directive and no explicit types, $n
just translates directly to S[n]
. When you introduce %union
, though, it translates to S[n].field
, where the field
name comes from the implied or supplied type.
Thus, in the above, when handling an INTEGER
which produces only a single_ival
, bison/byacc generates what you need with no additional work on your part. But, when handling a TOKEN
that produces a pair
, S[1].pair
is not sufficient to select one element of the struct
. Adding .pair_sval
selects the char *
element of the struct
.
The struct type's name, struct named_for_discussion_below
, never appears in any automatically-generated code. If you want to pass a copy of the struct type, or a pointer to an instance of it, to some routine—e.g., alter(&$1)
, when $1
expands to S[1].pair
—you will need to use the struct type's name. If you never do this, you can omit the name entirely.