1
votes

I am trying to write a simple parser using Lex and Yacc. And I am not familiar with these two before. When I finish the lex and yacc file, and compile it I got error. I think the error is related to string head files that are not included properly, but I couldn't figure it out by myself.

The Lex file named "tokens.l":

%{
#include "parser.hpp"
%}


MODEL       "model"
PORT        "input"|"output"|"intern"
GATE         "xor"|"and"|"or"|"buf"|"cmos1"|"dff"|"dlat"|"inv"|"mux"|"nand"|"nor"|"tie0"|"tie1"|"tiex"|"tiez"|"tsh"|"tsl"|"tsli"|"xnor"
INSTNAME    [A-Z0-9]+
PRIMITIVE   "primitive"
LEFT        "("
RIGHT       ")"
COMMA       ","
SEMICOLON   ";"
EQUAL       "="
BLANK       [ \t\n]+

%%

{MODEL} {return MODEL;}
{PORT}  { if (yytext == "input")
             return INPUT;
         else if (yytext == "output")
             return OUTPUT;
         else
             return INTERN;
        }
_{GATE}     {return GATE;}
{INSTNAME}  {return INSTNAME;}
{PRIMITIVE} {return PRIMITIVE;}
{LEFT}      {return LEFT;}
{RIGHT}     {return RIGHT;}
{COMMA} {return COMMA;}
{SEMICOLON} {return SEMICOLON;}
{EQUAL}     {return EQUAL;}
{BLANK} {;}
"\0"        {return END;}

%%

The yacc file named "parser.y":

%{
#include <iostream>
#include <string>
#include <cstdio>
extern FILE *fp;
%}

%union{
std::string* str;
}

%token <str> MODEL
%token <str> INPUT
%token <str> OUTPUT
%token <str> INTERN
%token <str> GATE
%token <str> INSTNAME
%token PRIMITIVE
%token LEFT
%token RIGHT
%token COMMA
%token SEMICOLON
%token EQUAL
%token END
%type <str> vfile modules module params param interngates interngate primitives

%%
vfile    : modules END   {
                std::ofstream fp;
                fp.open("output.v");
                fp<<$1;
                fp.close();
                $$ = new std::string("success");
                std::cout<<$$;
                }
modules     : modules module    {$$=$1+$2;}
            | module        {$$=$1;}
module      :MODEL INSTNAME LEFT params RIGHT LEFT interngates RIGHT
            {$$ = "module "+$2+" ("+$4+");\n"+$7+"endmodule\n";}
interngates :interngates interngate {$$=$1+$2+"\n";}
            |interngate     {$$=$1+"\n";}
interngate  :INPUT LEFT params RIGHT primitives {$$=$1+$3+"\n"+$5;}
            | OUTPUT LEFT params RIGHT primitives   { $$=$1+$3+"\n"+$5;}
            | INTERN LEFT params RIGHT primitives   {$$="wire"+$3+"\n"+$5;}

primitives  :LEFT RIGHT {$$="";}
            |LEFT PRIMITIVE EQUAL GATE INSTNAME params SEMICOLON RIGHT
            {$$=$4+" "+$5+" ("+$6+");\n";}
params      :params COMMA param {$$=$1+","+$3;}
            | param     {$$=$1;}
param       :INSTNAME   {$$=$1;}
%%

To compile the file, I use the command below:

bison -d -o parser.cpp parser.y
lex -o tokens.cpp tokens.l
g++ -o myparser tokens.cpp parser.cpp -lfl

Can anybody give me a clue? Thanks a lot!

Updated: Error report on osx. http://www.edaplayground.com/x/3HL

1
How about posting the error messages for us so we don't have to go compile your files.codenheim
@codenheim, sure, but the information of error various according to different. And the error report is very long. Check it via link aboveDreamOn
Got it. One problem is you cant use a C++ std::string in %union. You can use a string pointer (string *str) but not the value type (string str).codenheim

1 Answers

3
votes

You can't use automatic storage for C++ std::string (or any other string class with non-trivial constructor) in %union. You'll need to use dynamic (heap).

Instead of

%union {
    string str;
}

Try:

%union {
    std::string *str;
}

You will need to change all of the uses of yylval->str or $$, $1, etc. where $N %type is to use dynamically allocated strings.

So instead of

$$ = "success";

You have to do:

$$ = new std::string("success");

It is customary to use pointers in yacc/bison parser YYSTYPE %union anyway to avoid a huge amount of copying on the stack. Keep in mind your productions should take care of freeing strings for tokens or non-terminals that are no longer used unless your parser runtime is short-lived and the source files aren't huge, then you can cheat and just avoid freeing them or use garbage collection.

It is possible to redefine YYSTYPE to a regular string (non-pointer), but you lose the ability to use the union, which most non-trivial parsers need to pass up a mix of tokens or typed AST objects in semantic actions. Constraining your productions to a single type is less useful than void *.

It is also possible to redefine YYSTYPE to use a variant / polymorphic type, or use a multi-member struct (poor substitution for variant). The former defeats the purpose of the "type safe" %type and %token macros, and the latter forces you to remember the type of each terminal or non-terminal and use explicit notation for the member of your struct ($$->str = "foo", $$->expr.left = $1->str, etc.), This is the downside to using a C based parser with C++. You may want to try Bison's C++ parser skeleton, I have little experience with it due to compile errors everytime I tried it over the years.

There are other (better) workarounds that I have found; I have seen Bison patched to allow boost::variant for YYSTYPE with support of %type and %token. Google "bison Michiel de Wilde" or "bison variant YYSTYPE" (http://lists.gnu.org/archive/html/bison-patches/2007-06/msg00000.html), however, like many Bison suggestions over the years, the patches are met with some vague arguments or general discussion about alternatives, then it fizzles.