1
votes

With a pretty basic Bison / Flex grammar, I'm trying to pull tokens / expressions into C++ objects to generate three op codes from (i.e. an internal representation). I'm doing this because this particular parser represents a smaller subset of a larger parser. My problem comes with repeated expressions / tokens.

For example:

10 + 55 will parse as 10 + 10.

10 + VARIABLLENAME will parse fine, as INT and VARIABLE are different tokens.

55-HELLOWORLD / 100 will again parse fine, presumably because there is never two of the same token either side of the expression.

55-HELLOWORLD - 100 Seg Faults out. Repeating Operation tokens (i.e. -, +, /, etc causes the parser to crash).

TLDR: When repeating Value Types (i.e. INT, FLOAT, VARIABLE), the same token is returned twice. When repeating Operations, the parser seg faults.

My presumption is something I'm doing when loading the $1/$3 values into class objects then adding them to the parser stack is the problem. I've tried checking the memory addresses of each variable + pointer I generate, and they all appear to be as I'd expect (i.e. im not overriting the same object). I've tried ensuring values are loaded properly as their value tokens, INT | and VARIABLE | both load their respective vars properly into classes.

The issue seems to be pinpointed to the expression OPERATION expression statements, when using two of the same type of value the expressions are identical. To use an earlier example:

10 + 55 -> expression PLUS expression -> $1 = 10, $3=10

When the variables are loaded as INT, both are as expected?

Here's my respective parser.y, as well as the object's i'm trying to load values into.

%{
  #include <cstdio>
  #include <iostream>
  #include "TOC/Operation.h"
  #include "TOC/Value.h"
  #include "TOC/Variable.h"
  #include "TOC.h"

  using namespace std;

  extern int yylex();
  extern int yyparse();
  extern FILE *yyin;

  void yyerror(const char *s);
%}

%code requires {
    // This is required to force bison to include TOC before the preprocessing of union types and YYTYPE.
    #include "TOC.h"
}

%union {
  int ival;
  float fval;
  char *vval;
  TOC * toc_T;
}

%token <ival> INT
%token <fval> FLOAT
%token <vval> VARIABLE

%token ENDL PLUS MINUS MUL DIV LPAREN RPAREN

%type <toc_T> expression1
%type <toc_T> expression

%right PLUS MINUS
%right MUL DIV

%start start
%%
start:
        expressions;
expressions:
    expressions expression1 ENDL
    | expression1 ENDL;
expression1:
    expression { 
        TOC* x = $1;
        cout<<x->toTOCStr()<<endl; 
    }; 
expression: 
    expression PLUS expression { 
        TOC *a1 = $1;
        TOC *a2 = $3;
        Operation op(a1, a2, OPS::ADD);
        TOC *t = &op;
        $$ = t;
    }
    |expression MINUS expression { 
        TOC *a1 = $1;
        TOC *a2 = $3;
        Operation op(a1, a2, OPS::SUBTRACT);
        TOC *t = &op;
        $$ = t;    
    }
    |expression MUL expression {
        TOC *a1 = $1;
        TOC *a2 = $3;
        Operation op(a1, a2, OPS::MULTIPLY);
        TOC *t = &op;
        $$ = t;
    }
    |expression DIV expression { 
        TOC *a1 = $1;
        TOC *a2 = $3;
        Operation op(a1, a2, OPS::DIVIDE);
        TOC *t = &op;
        $$ = t;
    }
    |LPAREN expression RPAREN { 
        TOC *t = $2; 
        $$ =  t;
    }
    | INT { 
        Value<int> v = $1;
        TOC *t = &v; 
        $$ =  t;
    }
    | FLOAT { 
        Value<float> v = $1;
        TOC *t = &v;
        $$ = t; 
    }
    | VARIABLE {
        char* name = $1;
        Variable v(name);
        TOC *t = &v;
        $$ = t;
    }
%%

void yyerror(const char *s) {
  cout << "Parser Error:  Message: " << s << endl;
  exit(-1);
}

And the values I'm trying to load (concatenated as one file, for some clarity).

Operation.h

enum OPS {
    SUBTRACT,
    ADD,
    MULTIPLY,
    DIVIDE,
    EXPONENT
};

class Operation : public TOC{

    OPS op;
    public:
        TOC* arg1;
        TOC* arg2;
        Operation(TOC* arg1_in, TOC* arg2_in, OPS operation){
            tt = TOC_TYPES::OPERATION_E;
            arg1 = arg1_in;
            arg2 = arg2_in;
            op = operation;
        };


        std::string toOPType(OPS e){
            switch (e){
                case SUBTRACT:
                    return "-";
                case ADD:
                    return "+";
                case MULTIPLY:
                    return "*";
                case DIVIDE:
                    return "/";
                case EXPONENT:
                    return "^";
                default:
                    return "[Operation Error!]";
            }
        }

        std::string toTOCStr(){
            return arg1->toTOCStr() + toOPType(op) + arg2->toTOCStr();
        }
};

Value.h

template <class T> class Value : public TOC {
    public:
        T argument;
        Value(T arg){
            tt = TOC_TYPES::VALUE_E;
            argument = arg;
        }

        std::string toTOCStr(){
            std::string x = std::to_string(argument);
            return x;
        }
};

Variable.H

class Variable : public TOC {
    public:
        char *name;
        Variable(char* name_in){
            tt = TOC_TYPES::VARIABLE_E;
            name = name_in;
        }
        std::string toTOCStr(){
            std::string x = name;
            return x;
        }
};

TOC.h, in case this is needed

enum TOC_TYPES { 
    VARIABLE_E, 
    VALUE_E,
    OPERATION_E
};

class TOC{
    public:
        TOC_TYPES tt;   
        virtual std::string toTOCStr() = 0;
};

My Main file simply loads in a file and sets yyin to it's contents, before calling yyparse. I haven't included it, but can if needsbe (it's not very exciting).

Ideally, I'd like to load my entire RD parse tree into a TOC*, which I can then iterate down through to generate three op code at each level. This error breaking repeating tokens and operations is really stumping me however.

1

1 Answers

2
votes

Here's an example of the problem:

    Operation op(a1, a2, OPS::ADD);
    TOC *t = &op;
    $$ = t;

(t is unnecessary; you could just as well have written $$ = &op;. But that's just a side-note.)

op here is an automatic variable, whose lifetime ends when the block is exited. And that happens immediately after its address is saved in $$. That makes the semantic value of the production a dangling pointer.

Using the address of a variable whose lifetime has ended is Undefined Behaviour, but you can probably guess what is happening: the next time the block is entered, the stack is at the same place and the new op has the same address as the old one. (There's no guarantee that that will happen: undefined behaviour is undefined by definition. But this particular result is consistent with your observation.)

In short, get cosy with the new operator:

$$ = new Operation(a1, a2, OPS::ADD);

And don't forget to delete it at an appropriate moment.