Parser (Yacc) seems like it ignores tokens in grammar

Question

Parsing the c-like example code, i have the following issue. Its like some tokens, like identifiers, are ignored by grammar, causing a non-reason syntax error.

Parser code :

%{
#include <stdio.h>
#include <stdlib.h>

int yylex();
void yyerror (char const *);

%}

%token T_MAINCLASS T_ID T_PUBLIC T_STATIC T_VOID T_MAIN T_PRINTLN T_INT T_FLOAT T_FOR T_WHILE T_IF T_ELSE T_EQUAL T_SMALLER T_BIGGER T_NOTEQUAL T_NUM T_STRING

%left '(' ')'
%left '+' '-'
%left '*' '/'
%left '{' '}'
%left ';' ','
%left '<' '>'

%% 
        
PROGRAM     : T_MAINCLASS T_ID '{' T_PUBLIC T_STATIC T_VOID T_MAIN '(' ')' COMP_STMT '}'
        ;

COMP_STMT   : '{' STMT_LIST '}'
        ;
    
STMT_LIST   : /* nothing */
        | STMT_LIST STMT
        ;

STMT        : ASSIGN_STMT
        | FOR_STMT
        | WHILE_STMT
        | IF_STMT
        | COMP_STMT
        | DECLARATION
        | NULL_STMT
        | T_PRINTLN '(' EXPR ')' ';'
        ;

DECLARATION : TYPE ID_LIST ';'
        ;

TYPE        : T_INT
        | T_FLOAT
        ;

ID_LIST     : T_ID ',' ID_LIST
        |
        ;

NULL_STMT   : ';'
        ;

ASSIGN_STMT : ASSIGN_EXPR ';'
        ;

ASSIGN_EXPR : T_ID '=' EXPR
        ;

EXPR        : ASSIGN_EXPR
        | RVAL
        ;

FOR_STMT    : T_FOR '(' OPASSIGN_EXPR ';' OPBOOL_EXPR ';' OPASSIGN_EXPR ')' STMT
        ;

OPASSIGN_EXPR   : /* nothing */
        | ASSIGN_EXPR
        ;

OPBOOL_EXPR : /* nothing */
        | BOOL_EXPR
        ;

WHILE_STMT  : T_WHILE '(' BOOL_EXPR ')' STMT
        ;

IF_STMT     : T_IF '(' BOOL_EXPR ')' STMT ELSE_PART
        ;

ELSE_PART   : /* nothing */
        | T_ELSE STMT
        ;

BOOL_EXPR   : EXPR C_OP EXPR
        ;

C_OP        : T_EQUAL | '<' | '>' | T_SMALLER | T_BIGGER | T_NOTEQUAL
        ;

RVAL        : RVAL '+' TERM
        | RVAL '-' TERM
        | TERM
        ;

TERM        : TERM '*' FACTOR
        | TERM '/' FACTOR
        | FACTOR
        ;

FACTOR      : '(' EXPR ')'
        | T_ID
        | T_NUM
        ;

%%

void yyerror (const char * msg)
{
  fprintf(stderr, "C-like : %s\n", msg);
  exit(1);
}

int main ()
{
  if(!yyparse()){
    printf("Compiled !!!\n");
   }
}

Part of Lexical Scanner code :

{Empty}+    { printf("EMPTY ") ; /* nothing */ }

"mainclass" { printf("MAINCLASS ") ; return  T_MAINCLASS ; }

"public"    { printf("PUBLIC ") ; return T_PUBLIC; }
    
"static"    { printf("STATIC ") ; return T_STATIC ; }

"void"      { printf("VOID ") ; return T_VOID ; }

"main"      { printf("MAIN ") ; return T_MAIN ; }

"println"   { printf("PRINTLN ") ; return T_PRINTLN ; }

"int"       { printf("INT ") ; return T_INT ; }

"float"     { printf("FLOAT ") ; return T_FLOAT ; }

"for"       { printf("FOR ") ; return T_FOR ; }

"while"     { printf("WHILE ") ; return T_WHILE ; }

"if"        { printf("IF ") ; return T_IF ; }

"else"      { printf("ELSE ") ; return T_ELSE ; }

"=="        { printf("EQUAL ") ; return T_EQUAL ; }

"<="        { printf("SMALLER ") ; return T_SMALLER ; }

">="        { printf("BIGGER ") ; return T_BIGGER ; }

"!="        { printf("NOTEQUAL ") ; return T_NOTEQUAL ; }

{id}        { printf("ID ") ; return T_ID ; }

{num}       { printf("NUM ") ; return T_NUM ; }

{string}    { printf("STRING ") ; return T_STRING ; }

{punct}     { printf("PUNCT ") ; return yytext[0] ; }

<<EOF>>     { printf("EOF ") ; return T_EOF; }

.       { yyerror("lexical error"); exit(1); }

Example :

mainclass Example {
       public static void main ( )
       {
         int c;
         float x, sum, mo;
         c=0;
         x=3.5;
         sum=0.0;
         while (c<5)
         {
                  sum=sum+x;
                  c=c+1;
                 x=x+1.5;
        }
       mo=sum/5;
       println (mo);
       }
}

Running all this stuff it showed up this output:

C-like : syntax error
MAINCLASS EMPTY ID

It seems like id is in wrong position although in grammar we have:

PROGRAM     : T_MAINCLASS T_ID '{' T_PUBLIC T_STATIC T_VOID T_MAIN '(' ')' COMP_STMT '}'

You should use bison's built-in trace feature, which shows you exactly what is going on, rather than trying to guess from your own sprinkling of printfs. Also, you should debug your lexer before starting on the grammar, although bison's trace feature will help with that, too, since it shows you each token as it is read. — rici
It looks like from what you show, the lexer is failing to return a '{' token for the { after the id... — Chris Dodd
Finally it worked by change the order of tokens in the beginning of the parser's file. Built-in-trace feature really helped. — gmavros1

rici rici · Accepted Answer · 2020-08-15T20:17:50

Based on the "solution" proposed in OP's self answer, it's pretty clear that the original problem was that the generated header used to compile the scanner was not the same as the header generated by bison/yacc from the parser specification.

The generated header includes definitions of all the token types as small integers; in order for the scanner to communicate with the parser, it must identify each token with the correct token type. So the parser generator (bison/yacc) produces a header based on the parser specification (the .y file), and that header must be #included into the generated scanner so that scanner actions can used symbolic token type names.

If the scanner was compiled with a header file generated from some previous version of the parser specification, it is quite possible that the token numbers no longer correspond with what the parser is expecting.

The easiest way to avoid this problem is to use a build system like make, which will automatically recompile the scanner if necessary.

The easiest way to detect this problem is to use bison's built-in trace facility. Enabling tracing requires only a couple of lines of code, and saves you from having to scatter printf statements throughout your scanner and parser. The bison trace will show you exactly what is going on, so not only is it less work than adding printfs, it is also more precise. In particular, it reports every token which is passed to the parser (and, with a little more effort, you can get it to report the semantic values of those tokens as well). So if the parser is getting the wrong token code, you'll see that right away.

Parser (Yacc) seems like it ignores tokens in grammar

2 Answers