4
votes

I'm building a simplified C parser using bison and flex. I've written a grammar rule to detect functions in my Bison file, and I want the parser to send the name of the function to my C monitoring program. Here is an extremely simplified sample of what I've implemented :

my monitor program

/*monitor.h*/
#ifndef MONITOR_H
#define MONITOR_H

extern int noLig;
extern char* yytext;

#endif

/*monitor.c*/
#include <stdio.h>
#include <monitor.h>
#include <project.tab.h>

int noLig=0;

int main (int argc, char * argv[]) {
    printf("flex-\n");
    int err_code=yyparse();
    if (err_code == 0) {
        printf("It went fine\n");
    } else {printf("It didn't go well\n");}
    return 0;
}

project.l file

%{
#include <stdio.h>
#include <monitor.h>
#include <project.tab.h>
%}

%%
"\"[a-zA-Z0-9]+"\"  {ECHO; yylval.str=yytext; return STR;}
[.,;=()\[\]\{\}]        { return yytext[0]; }

"char"      {ECHO; printf("-"); yylval.str=yytext; return TYPE;}
"int"       {ECHO; printf("-");yylval.str=yytext; return TYPE;}
"float"     {ECHO; printf("-");yylval.str=yytext; return TYPE;}
"double"    {ECHO; printf("-");yylval.str=yytext;return TYPE;}

[a-zA-Z][a-zA-Z0-9]* {ECHO; printf("-");yylval.str = yytext; return VAR;}

[ \t\n\b]+  {noLig++;}
"//".*      {}
.       {printf(":%cwhat is this",yytext[0]);}
%%

project.y file

%{
#include    <stdio.h>
#include    <monitor.h>
int yylex();
int yyerror ();
%}
%union {
    char *str;
    int i;
}
%define parse.error verbose
%type <i> INT
%type <str> TYPE STR VAR

%token TYPE INT STR VAR

%start Program
%%
Program: function_l
    ;

function_l: function
    | function_l function
    ;

function: TYPE VAR '(' param_prototype_ld ')' '{' instruction_l '}'
    {printf("\n\nbison-\n%s\n",$1);}
    ;

param_prototype_ld: /*empty*/
    | param_prototype_l
    ;

param_prototype_l: param_prototype
    | param_prototype_l ',' param_prototype
    ;

param_prototype: TYPE VAR
    ;

instruction_l: /*empty*/
    | VAR ';'
    | VAR instruction_l
    ;
%%
int yyerror (char* s) {
    fprintf(stderr, "Compilator3000:l.%d: %s\n", noLig, s);
}

test.c file

int main (int arg) {
    a;
}

It compiles alright, with no warning. However, when I run ./monitor < test.c, I get the following output :

flex-
int-main-int-arg-a-

bison-
int main (int arg) {
    a;
}
It went fine

Why does bison variable $1 returns the whole function block ? How can I get only the return type ? (In the end, my goal is to print the return type, function name, and the type of the arguments)

2
The contents of yytext changes with each match, and should be copied with a function like strdup.kdhp

2 Answers

3
votes

The value of yytext is not guaranteed to persist, and to make yytext persist it must be copied to a separate buffer. This is often done using strdup:

...
"\"[a-zA-Z0-9]+"\"  {ECHO; yylval.str=strdup(yytext); return STR;}
[.,;=()\[\]\{\}]    { return yytext[0]; }

"char"      {ECHO; printf("-"); yylval.str=strdup(yytext); return TYPE;}
"int"       {ECHO; printf("-");yylval.str=strdup(yytext); return TYPE;}
"float"     {ECHO; printf("-");yylval.str=strdup(yytext); return TYPE;}
"double"    {ECHO; printf("-");yylval.str=strdup(yytext);return TYPE;}

[a-zA-Z][a-zA-Z0-9]* {ECHO; printf("-");yylval.str = strdup(yytext); return VAR;}
...

Though strdup can return NULL, a wrapper function can be used to make that failure explicit

char *
strdup_checked(char *str)
{
        char *p;

        if ((p = strdup(str)) == NULL) {
                perror("strdup");
                exit(EXIT_FAILURE);
        }
        return (p);
}
1
votes

Although OP has found the solution, I would try to provide a complete answer.

This is the Grammar which defines the function as per your code.

function: TYPE VAR '(' param_prototype_ld ')' '{' instruction_l '}'
    {printf("\n\nbison-\n%s\n",$1);}
    ;

Each symbol is a positional variable. Eg TYPE is $1, VAR is $2 (which is I believe the function name). $$ is the return value of any rule. In this case function. In order to return the function name you need to set $$=$2 in the action part. Setting $$ to $1 will return the function name. Alternatively you can create a data structure in the action such as an array or struct to hold more than one $ variable and then return it.

The effect would be seen in the following rule

function_l: function
    | function_l function
    ;

Non-terminal symbol function will hold the name of the function. In this rule "function_l" is $1 and "function" is $2. If you print $2 it will give you the function name which was passed from the rule "function".

Since the terminal "VAR" is a string you need to set yylval.str=strdup(yytext) either in the lex or in the grammar rule as yylval.str=strdup($<pos>)