1
votes

I have just started working with Bison and I am facing a couple of issues. The goal of my grammar is to recognize a "command pipeline" language where the output of one command can be piped as the input of another command. Each command can also have a list of parameters with an optional value. Here is an example:

command1 --param1 test --param2 test2 | command2 | command3 --param3

In the previous example, command1 takes two parameters, param1 with value test and param2 with value test2. The output of command1 is then "piped" to command2, which takes no parameters. Finally the result of command2 is piped to command3, which takes one parameter (parameters with no value are considered as a switch).

The corresponding bison grammar is given below:

%{

#include <string>
#include <iostream>
#include "AST.h"
#include "Evaluator.h"

void yyerror (const char *error);
int  yylex();

%}

%code{
    Evaluator eval;
}

%union {
  char* s;
  double d;
  AbstractNode* abstractnode;
  CmdletCall* cmdletdef;
  ParameterDef* parameterdef;
  ParameterListDef* paramlistdef;
}

/* declare tokens */

%token EOL
%token<s> PARAMETERNAME
%token<s> BUILTIN
%token<s> IDENTIFIER


%type <parameterdef> parameterDef
%type <cmdletdef> cmdletCall
%type <abstractnode> pipeLineDef    
%type <paramlistdef> paramList
%%

input: /* nothing */
    | input pipeLineDef EOL {
                                $2->accept(eval);
                                delete $2;
                                std::cout << eval.result() << std::endl;
                            }                        
    | input EOL             {}
    ;



pipeLineDef: cmdletCall                  {$$ = $1;}      
    | pipeLineDef '|' cmdletCall         {$$ = new PipeLineDef($1, $3);}   
    ;


cmdletCall: IDENTIFIER                   {$$ = new CmdletCall($1);}                 
          | cmdletCall paramList         {$1->setParameterList($2);}   
          ;

paramList: parameterDef                 {  
                                            $$ = new ParameterListDef;
                                            $$->addChildNode($1);
                                        }  
         | paramList parameterDef       {
                                            $1->addChildNode($2);
                                            $$ = $1;
                                        }
         ;

parameterDef: PARAMETERNAME             {$$ = new ParameterDef($1);}
            | parameterDef IDENTIFIER   {
                                            $1->setValue($2);
                                        }         

            ;




%%


void yyerror (const char *error)
{
  std::cout << error << std::endl;
}

The previous grammar has one shift-reduce conflict, which is reproduced below:

Terminals unused in grammar

BUILTIN

State 10 conflicts: 1 shift/reduce
State 10

7 cmdletCall: cmdletCall paramList .
9 paramList: paramList . parameterDef

PARAMETERNAME  shift, and go to state 9

PARAMETERNAME  [reduce using rule 7 (cmdletCall)]
$default       reduce using rule 7 (cmdletCall)

parameterDef  go to state 13

I would like to remove the conflict but I am not sure how I should proceed (my understanding is that a shift/reduce conflict indicates an ambiguous grammar). However, from my initial tests, everything seems to work as expected. Another point that puzzles me is the way Bison evaluates the terminals. For example, if I rewrite the cmdletCall and paramList rules as follows, the grammar breaks:

cmdletCall: IDENTIFIER paramList         {$$ = new CmdletCall($1);    $$->setParameterList($2);}   
          ;

paramList: /*nothing*/ 
                                        {  
                                            $$ = new ParameterListDef;

                                        }  
         | paramList parameterDef       {
                                            $1->addChildNode($2);
                                            $$ = $1;
                                        }
         ;

If I rewrite the grammar as shown above, then for an input such as:

command1 --param1

In the cmdletCall rule, The value of $1, which corresponds to the IDENTIFIER token, will be command1 --param1, instead of command1 only.

1
Although C (or optionally C++) is the language for which bison generates code, this question isn't actually about either of those. Tags edited.John Bollinger
That's two separate problems, which should be two separate questions. The second question almost certainly involves your lexer (and in some sense, the C/C++ code therein), although I'd suggest you just search for the need to copy yytext in your lexer; it's in all the faqs.rici
Keeping the second question as part of this same question because it illustrates the kind of issues you can have using bison and flex in practice. And since you have given the answer to both questions it makes sense to keep them together :)BigONotation

1 Answers

3
votes

Your definition of cmdletCall is effectively IDENTIFIER paramList* (using the standard regular-language Kleene star operator). But paramList is paramDef+. So that's clearly ambiguous; there is no way to tell how many paramLists follow the IDENTIFIER because there is no indication where one ends and the next one begins.

In fact, you want to have only (or at most) one paramList. There are several options, but the simplest is:

cmdletCall: IDENTIFIER paramList
paramList : /* empty */
          | paramList parameterDef

Another option would be to keep paramList non-nullable, and add a cmdletCall option which consists only of an IDENTIFIER. Really, that's only useful if you need to separate the cases for a semantic rule, but is certainly possible.

Or for an even simpler grammar, just get rid of paramList:

cmdletCall: IDENTIFIER
          | cmdletCall parameterDef

Since bison-generated parsers prefer shift to reduce, the parser produced from your ambiguous grammar will do what you expect: only produce zero or one paramList for each command. But you would be well advised to eliminate the ambiguity anyway.


The problem you had with the associated semantic value is the result of not copying the contents of yytext from your scanner; yytext is an internal datastructure in the flex-built scanner, and its contents do not belong to you, since the scanner can and will modify them as it sees fit.