0
votes

I am trying to construct a parser with Bison. I have the following in the first section:

%union {
    int ttype;
    // enums used in lexer
    Staff stafftype;
    Numeral numeral;
    Quality quality;
    Inversion inversion;
    Pitch pitch;
    Accidental accidental;
    // Classes used in parser
    Roman roman;
}

%token <stafftype> STAFFTYPE
%token <numeral> NUMERAL
%token <quality> QUALITY
%token <inversion> INVERSION
%token <pitch> PITCH
%token <accidental> ACCIDENTAL
%token <ttype> COLON
%token <ttype> SLASH
%token <ttype> COMMA

%type <roman> accidentalRoman

With some grammar rules. Here is one:

accidentalRoman
    : NUMERAL { $$ = Roman($1); }
    | ACCIDENTAL NUMERAL { $$ = Roman($2, $1); }
    ;

I basically have three related questions.

  1. What does the %union really represent? I thought it represented types the lexer could return. My lexer rules contain statements like return STAFFTYPE, to indicate that I have populated yylval.stafftype with a Staff object. Fair enough. However;
  2. the union also seems to have something to do with the $$ = statements in the grammar actions. Why do the result types of grammar actions need to be in the union?
  3. In my example, the Roman class has a constructor with parameters. However, declaration in the union causes the error no matching function for call to 'Roman::Roman()'. Is there any way around this? I'm trying to build up a parse tree with $$ =, and the nodes in the tree definitely need parameters in their constructors. In fact, it doesn't even allow a 0-parameter constructor: error: union member 'YYSTYPE::roman' with non-trivial 'Roman::Roman().
1
It's years ago I used bison. As I know it, yacc/bison generates C code out of your y file. Please, have a look into it - it may help you to understand what happens "under the hood". You should find a union which contains exactly what %union provides. union (in C++) and classes do not work well together (due to the construction issue). It may work under some restrictions but I prefer to prevent this completely.Scheff's Cat
The %union is used to attribute rules. Whenever, there is extra data which has to be passed for a token (flex) or a rule (bison), %union comes into play. That was OK in C but may cause conflicts if you want to use it in C++. (Sorry, I start to repeat me.) A work around could be, to use global variables (may be even stacks) instead of attributes (crafted with %union). For me, it was one reason to stop the usage of bison (and making recursive descent parsers instead. Btw. debugging and maintenance was the main reason...)Scheff's Cat

1 Answers

1
votes
  1. What does the %union really represent? I thought it represented types the lexer could return.

No. It represents types that productions can return, via $$ =. The lexer just returns integer constants defined via %token directives. The lexer can populate a yylval member as a side effect, but it isn't a return type of the lexer in any sense.

My lexer rules contain statements like return STAFFTYPE, to indicate that I have populated yylval.stafftype with a Staff object.

They shouldn't. They should return token types as used in the grammar, and you shouldn't usually have put anything into yylval except in the case of literals. You're doing work in the lexer that the parser should do.

  1. the union also seems to have something to do with the $$ = statements in the grammar actions. Why do the result types of grammar actions need to be in the union?

Because that's where they are placed. On top of the stack of yylval values.

  1. In my example, the Roman class has a constructor with parameters. However, declaration in the union causes the error no matching function for call to 'Roman::Roman()'. Is there any way around this? I'm trying to build up a parse tree with $$ =, and the nodes in the tree definitely need parameters in their constructors. In fact, it doesn't even allow a 0-parameter constructor: error: union member YYSTYPE::roman with non-trivial Roman::Roman().

In general the%union should consist of ints, doubles, other primitive types, and pointers. Objects in unions are problematic anyway, and on a parser stack are mostly a massive waste of space.