3
votes

I am working with Flex and Bison. in my parse.y (bison) I define tokens. When the token is return it returns an int I was wondering if there is a way to take that int and map it back to the actual name in the bison source. For example in my parser.y

//define my tokens that are shared with my lexer (flex)
%token <tokenData> ID
%token <tokenData> NUMCONST

in my grammar I then use

number : NUMCONST   {std::cout<<"Line "<<$1->linenum<<" Token: [I want NUMCONST]"<<<std::endl;}

I know I can display the int that is returned from the lexer but is there away to return the token's type such as "NUMCONST" or "ID". I want token "type" instead of token "int"

2

2 Answers

15
votes

Yes you can, but you need to enable the feature in your bison file.

If you put the directive %token-table into your bison file, then bison will generate a table of token names called yytname. (You can also enable this feature with the -k or --token-table command-line flags.)

yytname[i] is the name of the token whose "internal bison token code number" is i. That's not the same as the number returned by yylex, because bison recodes the tokens using an (undocumented) table called yytranslate.

The token names in the yytname table are the token aliases if you use that feature. For example, if your grammar included:

%token EQEQ "=="
%%
exp: exp "==" exp
   | exp '+' exp

the names for the tokens corresponding to the two operators show in the exp rule are "==" and '+'.

yytname also includes the names of non-terminals, in case you need those for any purpose.

Rather than using yytranslate[t], you might want to use YYTRANSLATE(t), which is what the bison-generated scanner itself does. That macro translates out-of-range integers to 2, which has the corresponding name $undefined. That name will also show up for any single-character tokens which are not used anywhere in the bison grammar.

Both yytname and yytranslate are declared static const in the bison-generated scanner, so you can use them only in code which is present in that file. If you want to expose a function which does the translation, you can put the function in the grammar epilogue, after the second %%. (You might need such a function if you wanted to find the name corresponding to a token number in the scanner, for example.) It might look something like this:

const char token_name(int t) {
  return yytname[YYTRANSLATE(t)];
}

Normally, there is no need to do this. If you merely want to track what the parser is doing, you're much better off enabling bison's trace facility.

1
votes

bison generates an enum called yytokentype that contains an enumerated list of all the tokens in the grammar. It does not provide an equivalent mapping to strings containing all the token names.

So, you'll have to implement this mapping yourself. That is, implementing a utility function that takes a yytokentype parameter, and returns the name of the given token, which you can subsequently use in your diagnostic messages. Another, boring switch farm.

Having said that, it shouldn't be too difficult to write a utility Perl script, or an equivalent, that reads <filename>.tab.h that came out of bison, parses out the yytokentype enumeration, and robo-generates the mapping function. Stick that into your Makefile, with a suitable dependency rule, and you got yourself an automatic robo-generator of a token-to-name mapping function.