Yacc Grammar not reducing the last literal of the input correctly

Question

This is a code for generating a 3 address code for the arithmetic expression.

The problem I am facing is that my grammar is able to read the input correctly till the last literal. But unable to reduce the last literal just before the '\n'

example : x = 1 + 2 * 3 - 4
1,2,3 are read properly. Even the minus. When it reads the 4 it is able to reduce the grammar for 2 steps. Then gives an error.

I Have put the print statements for help. This is the whole code. Problem while reading last literal:

x -> IDEN | NUM | ....  
f -> x  
t -> f (This doesn't happen)

icg.y

%{
    #include<stdio.h>
    #include<stdlib.h>
    int temp_reg_num = 0;
    void print_three_address_code(char* text1,char oper,char* text2)
    {
        printf("t%d = %s %c %s\n",temp_reg_num,text1,oper,text2);
    }   
%}

%token IDEN, NUM, FLOAT 
%union {
    char* text;
}

%left '+' '-'
%left '*' '/'

%%
s : IDEN '=' e '\n' {
    printf("start\n");
    if($3.text == NULL)
        printf("%s = t%d\n",$1.text,temp_reg_num-1);
    else
        printf("%s = %s\n",$1.text,$3.text);
    return 0;
}
;
e : e '+' t {
    printf("e -> e + t\n");
    char temp[5];
    print_three_address_code($1.text,'+',$3.text);
    sprintf(temp,"t%d",temp_reg_num++);
    $$.text = copy_string(temp);
}
  | e '-' t {
    printf("e -> e - t\n");
    char temp[5];
    print_three_address_code($1.text,'-',$3.text);
    sprintf(temp,"t%d",temp_reg_num++);
    $$.text = copy_string(temp);
}
  | t {
    printf("e -> t\n");
    $$.text = copy_string($1.text);
}
;
t : t '*' f {
    printf("t -> t * f\n");
    char temp[5];
    print_three_address_code($1.text,'*',$3.text);
    sprintf(temp,"t%d",temp_reg_num++);
    $$.text = copy_string(temp);
}
  | t '/' f {
    printf("t -> t / f\n");
    char temp[5];
    print_three_address_code($1.text,'/',$3.text);
    sprintf(temp,"t%d",temp_reg_num++);
    $$.text = copy_string(temp);
}
  | f {
    printf("t -> f\n");
    $$.text = copy_string($1.text);
}
;
f : f '^' x {
    printf("f -> f ^ x\n");
    char temp[5];
    print_three_address_code($1.text,'^',$3.text);
    sprintf(temp,"t%d",temp_reg_num++);
    $$.text = copy_string(temp);
}
  | x {
    printf("f -> x\n");
    $$.text = copy_string($1.text);
    printf("Why syntax error??");
}
;
x : '(' s ')' {}
      | NUM {
    printf("x -> NUM\n");
    $$.text = copy_string($1.text);
}
      | IDEN {
    printf("x -> IDEN\n");
    $$.text = copy_string($$.text,$1.text);
}
      | FLOAT {
    printf("x -> FLOAT\n");
    $$.text = copy_string($1.text);
}
      | '-' x {
    printf("x -> - x\n");
    $$.text = (char*)malloc(sizeof(char)*(strlen($2.text)+2));
    $$.text[0] = '-';
    strcat($$.text,$2.text);
}
;
%%

main()
{
    printf("Enter your expression : ");
    yyparse();
}
yyerror()
{
    printf("Syntax Error\n");
}
yywrap()
{
    return 1;
}

icg.l

%{
    #include<stdio.h>
    #include<stdlib.h>
    #include"y.tab.h"
    char* copy_string(char* fromstring)  
    {
        char* tostring = (char*)malloc(sizeof(char)*(strlen(fromstring)+1));  
        strcpy(tostring,fromstring);  
        return tostring;  
    }
%}

iden [a-zA-Z_][a-zA-Z_]*  
%%  
[0-9]+ {  
    yylval.text = copy_string(yytext);  
    return NUM;  
}
[0-9]+[.][0-9]+ {  
    yylval.text = copy_string(yytext);  
    return FLOAT;  
}
{iden} {  
    yylval.text = copy_string(yytext);  
    return IDEN;  
}  
[+] {  
    return '+';  
}  
[-] {  
    return '-';  
}  
[*] {  
    return '*';  
}  
[/] {  
    return '/';  
}  
['^'] {  
    return '^';  
}  
[=] {  
    return '=';  
}  
'\n' {  
    return '\n';  
}  
. {}  
%%

rici rici · Accepted Answer · 2016-05-15T05:13:15

You're getting that syntax error because of this rule:

'\n' { return '\n'; }

Flex does not consider ' to be in any way special, so that rule will (only) match the three character sequence 'ENTER'. That doesn't appear in your input, so the rule which requires a \n will not succeed, and when you provide an end-of-file (or the file ends, if you are reading from a file), then you'll get a syntax error.

So what you should write is:

\n { return '\n'; }

Also,

['^']

does not only match a ^. It also matches a ', because, as above, ' is not in any way special and so the character class consists of three characters, one of which is repeated. To match exactly the character ^, use double-quotes (which are special):

"^"

(However, "\n" would not work, unless what you wanted to match was precisely a \ followed by an n. Just \n is what you want.)

You can actually simplify all of those single character rules into a single rule:

[-+*/^()=\n]   { return yytext[0]; }

I included ( and ) although they were not in your icg.l file because you use them in your grammar.

However, the rule you use them in is not correct:

x : '(' s ')' {}

since s is an assignment. That would allow

x = 1 + (y = 2 * 3) - 4

but not allow the more normal

x = 1 + (2 * 3) - 4

What you want is:

x : '(' e ')' { $$ = $2; }

There are various other errors in your grammar, including:

missing #include <string.h> header (for strlen)
missing declarations for yylex, yyerror and copy_string
incorrect (or more accurately, antiquated) prototypes for main and yyerror. "Modern C" -- as in, for the last 30 years or so -- prefers explicit return types, and since C99 (1999) requires them.
an instance of copy_string with two arguments instead of one, which is undefined behaviour. (It would have been flagged as an error if you'd provided a declaration for copy_string.)

Your icg.y file was processed without complaint by Berkeley yacc (byacc) but bison will not process it even in yacc compatibility mode. The problem is your use of $1.text and your failure to declare the tags of your terminals and non-terminals. If you are declaring a union type, you should provide tag declarations both for your terminals:

%token <text> NUM IDEN FLOAT

and your non-terminals (or at least the ones which have semantic values):

%type <text> s e t f x

Then you can remove all the .text selectors from your actions, since bison (and even byacc) will know which selector to use. Declaring token tags makes your parser type-safe, or at least less type-unsafe, and it is also usually more readable.

Finally, you need to work on your memory management. You never free() any of the strings you allocated in string_copy, so you are gradually building up quite a bit of leaked memory. Furthermore, you copy in many cases where it is unnecessary; for example, in the unit rule:

x : NUM { $$.text = copy_string($1.text); }

the copy is completely unnecessary, since $1 is about to be popped off the parser stack, and the string it references will never be used again. So you're just unnecessarily leaking a copy. Much cleaner would be

x : NUM { $$ = $1; }

but that is actually unnecessary because it is the default action.

Changing the unit productions won't stop the other memory leaks; in the productions which actually do something with the semantic values other than pass them through, you'll still need to manually free the copied strings if you don't need them any longer (which seems to be the case in all your semantic actions). Or you'll need to figure out how to free them later if you will need them later.

You can use the built-in tracing feature of yacc or bison, rather than stuffing your program full of printfs which you will later have to delete. Just provide the -t option when you run yacc, or add

#define YYDEBUG 1

to the `%{...%}' block in your icg.y file. Then modify your main function:

int main() {
  yydebug = 1;
  return yyparse();
}

That will tell you exactly what's going on while your input is being parsed, including which terminals are being received from your scanner.

Here is a full example showing one approach to memory management. It's based on your code, but with a certain amount of refactoring. Note that the parser never copies a string, but emit_tac does allocate a new string, as well as freeing the strings it is passed as arguments. (That might be considered rude by some observers.) The call to yylex_destroy() in main is required for modern flex-generated scanners, to free resources allocated by the lexer. The bison-specific %destructor prevents memory leaks in case of a syntax error: if you are not using bison, you'll have to find a different solution to that issue.

icg.y

%{
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    int yylex();
    int yylex_destroy();
    void yyerror(const char* msg);

    int temp_reg_num = 0;
    char* emit_tac(char* text1, char oper, char* text2) {
        char* rv = malloc(8);
        snprintf(rv, 7, "t%d", temp_reg_num++);
        printf("%s = %s %c %s\n", rv, text1 ? text1 : "", oper, text2);
        free(text1); free(text2);
        return rv;
    }
%}

%union {
    char* text;
}
%token <text> IDEN NUMBER
%type <text> e t f x
%destructor { free($$); } <text>

%%
s : IDEN '=' e '\n' { printf("%s = %s\n", $1, $3);
                      free($1); free($3);
                    }
e : e '+' t         { $$ = emit_tac($1,'+',$3); }
  | e '-' t         { $$ = emit_tac($1,'-',$3); }
  | t
t : t '*' f         { $$ = emit_tac($1,'*',$3); }
  | t '/' f         { $$ = emit_tac($1,'/',$3); }
  | f
f : f '^' x         { $$ = emit_tac($1,'^',$3); }
  | x 
x : IDEN 
  | NUMBER
  | '-' x           { $$ = emit_tac(NULL, '-', $2); }
  | '(' e ')'       { $$ = $2; }
%%

int main() {
    int rc = yyparse();
    yylex_destroy();
    return rc;
}

void yyerror(const char* msg) {
    printf("%s\n", msg);
}

icg.l (requires flex)

%{
    /* Change to y.tab.h if you use yacc */
    #include "icg.tab.h"
    char* copy_yytext() {
        char* tostring = malloc(yyleng + 1);
        memcpy(tostring, yytext, yyleng);
        tostring[yyleng] = 0;
        return tostring;  
    }
%}
%option noinput nounput noyywrap nodefault

%%

\n                           { return '\n'; }
[[:space:]]                  /* Ignore */
[[:digit:]]+(\.[[:digit]]*)? { yylval.text = copy_yytext(); return NUMBER;  }
[[:alpha:]_][[:alnum:]_]*    { yylval.text = copy_yytext(); return IDEN; } 
.                            { return yytext[0]; }

Compile

$ bison -d icg.y
$ flex icg.l
$ gcc -Wall -o icg lex.yy.c icg.tab.c

Test with valgrind to demonstrate absence of memory leaks

$ valgrind --leak-check=full ./icg <<< ' x = 1 + 2 - 3 / 4 ^ 5 * ( 6 - 9 )'
==26225== Memcheck, a memory error detector
==26225== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==26225== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==26225== Command: ./icg
==26225== 
t0 = 1 + 2
t1 = 4 ^ 5
t2 = 3 / t1
t3 = 6 - 9
t4 = t2 * t3
t5 = t0 - t4
x = t5
==26225== 
==26225== HEAP SUMMARY:
==26225==     in use at exit: 0 bytes in 0 blocks
==26225==   total heap usage: 17 allocs, 17 frees, 16,530 bytes allocated
==26225== 
==26225== All heap blocks were freed -- no leaks are possible
==26225== 
==26225== For counts of detected and suppressed errors, rerun with: -v
==26225== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Yacc Grammar not reducing the last literal of the input correctly

1 Answers

icg.y

icg.l (requires flex)

Compile

Test with valgrind to demonstrate absence of memory leaks