Syntax error when testing a blank file - LEX/YACC

Question

So, basically, I get a syntax error whenever I test an empty file or a file with just a comment in it. The error looks like this:

test01 is just an empty .c file

Here is my Lex file:

%{
/*constants are defined outside of the l file in y.tab.h
*constants are defined from 257
*/

#include "y.tab.h"
int input_line_no = 1;
char the_tokens[1000];
char full_line[1000];
int lex_state = 0;

%}

whitespace         [ \t]
number             [0-9]
letter             [A-Za-z]
alfanum            [A-Za-z0-9_]
intcon             {number}+
id                 {letter}{alfanum}*
anything           .

%option noyywrap
 /*
 *Start conditions are specified to identify comments, 
 *literal strings, and literal chars. 
 */

%Start comment_in comment_out string_in char_in

%%

/*tokenization of special strings*/
"extern"        {return EXTERN;}
"if"            {return IF;}
"else"          {return ELSE;}
"void"          {return VOID;}
"char"          {return CHAR;}
"int"           {return INT;}



 /*line number is recorded*/
[\n]                        input_line_no++;

 /*identify comment*/
<INITIAL>"/""*"      {
                lex_state=1;
                                BEGIN(comment_in);
                 }
<comment_in>"*"      BEGIN(comment_out);
<comment_in>[^*]     ;
<comment_out>"*"     ;
<comment_out>"/"     {
            lex_state = 0;
                    BEGIN(INITIAL);
                 }
<comment_out>[^*/]   BEGIN(comment_in);

/*start tokenization of strings*/
<INITIAL>\"             {
            lex_state = 2;
                            BEGIN(string_in);

                    }
<string_in>[^"]     {
            return STRINGCON;
        }
<string_in>\"       {
            lex_state = 0;
            BEGIN(INITIAL);
        }
 /*tokenization of characters*/
<INITIAL>\' {
        lex_state = 3;
        BEGIN(char_in);
    }
<char_in>[^']
     {
         return CHARCON;
     }
<char_in>\' {
        lex_state = 0;
        BEGIN(INITIAL);
    }

{whitespace}    ;

 /*tokenization of numbers*/
{intcon}         {return(INTCON);}
{id}        {return ID;}

/*tokenization of operations*/
"=="        {return EQUALS;}
"!="        {return NOTEQU;}
">="        {return GREEQU;}
"<="        {return LESEQU;}
">"     {return GREATE;}
"<"     {return LESSTH;}

"&&"        {return ANDCOM;}
"||"        {return ORCOMP;}
"!"             {return ABANG;}

";"     {return SEMIC;}
","     {return COMMA;}
"("     {return LPAR;}
")"     {return RPAR;}      
"["     {return LBRAC;}
"]"     {return RBRAC;}
"{"     {return LCURL;}
"}"     {return RCURL;}

"+"     {return ADD;}
"-"     {return SUB;}
"*"     {return MUL;}
"/"     {return DIV;}
"="     {return EQUAL;}

 /*For strings that can not be identified by any patterns specified previously
 *lex returns the value of the character
 */

  {anything}     {return(OTHER);}

%%

Here is my Yacc file:

    %{

#include <stdio.h>
#define YDEBUG
#ifndef YDEBUG

#define Y_DEBUG_PRINT(x)

#else

#define Y_DEBUG_PRINT(x) printf("Yout %s \n ",x)

#endif
int yydebug = 1; 

extern char the_token[]; 
 /* This is how I read tokens from lex... :) */
extern int input_line_no; 
 /* This is the current line number */
extern char *full_line; 
 /* This is the full line */
extern int lex_state;


%}

%token STRINGCON CHARCON INTCON EQUALS NOTEQU GREEQU LESEQU GREATE LESSTH
%token ANDCOM ORCOMP SEMIC COMMA LPAR RPAR LBRAC RBRAC LCURL RCURL ABANG
%token EQUAL ADD SUB MUL DIV ID EXTERN FOR WHILE RETURN IF ELSE 
%token VOID CHAR INT OTHER

%left ORCOMP
%left ANDCOM
%left EQUALS NOTEQU
%left LESSTH GREATE LESEQU GREEQU
%left ADD SUB
%left MUL DIV
%right UMINUS
%right ABANG

%start Assign
%%

prog: dcl SEMIC prog2 {Y_DEBUG_PRINT("prog-dcl-SEMIC-prog2");}
| Function prog2 {Y_DEBUG_PRINT("prog-Function-prog2");}

prog2: {Y_DEBUG_PRINT("prog2-EMPTY");}
| dcl SEMIC prog2 {Y_DEBUG_PRINT("prog2-dcl-SEMIC-prog2");}
| Function  prog2 {Y_DEBUG_PRINT("prog2-Function-prog2");}

dcl: VAR_list {Y_DEBUG_PRINT("dcl-VAR_list");}
| ID LPAR Param_types RPAR dcl2 {Y_DEBUG_PRINT("dcl-ID-LPAR-Param_types-RPAR-dcl2");}
| EXTERN ID LPAR Param_types RPAR dcl2 {Y_DEBUG_PRINT("dcl-EXTERN-ID-LPAR-Param_types-RPAR-dcl2");}
| EXTERN Type ID LPAR Param_types RPAR dcl2 {Y_DEBUG_PRINT("dcl-EXTERN-Type-ID-LPAR-Param_types-RPAR-dcl2");}
| EXTERN VOID ID LPAR Param_types RPAR dcl2 {Y_DEBUG_PRINT("dcl-EXTERN-VOID-ID-LPAR-Param_types-RPAR-dcl2");}
| Type ID LPAR Param_types RPAR dcl2 {Y_DEBUG_PRINT("dcl-Type-ID-LPAR-Param_types-RPAR-dcl2");}
| VOID ID LPAR Param_types RPAR dcl2 {Y_DEBUG_PRINT("dcl-VOID-ID-LPAR-Param_types-RPAR-dcl2");}

dcl2: {Y_DEBUG_PRINT("dcl2-EMPTY");}
| COMMA ID LPAR Param_types RPAR dcl2 {Y_DEBUG_PRINT("dcl2-COMMA-ID-LPAR-Param_types-RPAR-dcl2");}

Function: Functionhead LCURL Functionbody RCURL {Y_DEBUG_PRINT("Function-Functionhead-LCURL-Functionbody-RCURL");}
| VOID Functionhead LCURL Functionbody RCURL {Y_DEBUG_PRINT("Function-VOID-Functionhead-LCURL-Functionbody-RCURL");}
| Type Functionhead LCURL Functionbody RCURL {Y_DEBUG_PRINT("Function-Type-Functionhead-LCURL-Functionbody-RCURL");}

Functionhead: ID LPAR Param_types RPAR {Y_DEBUG_PRINT("Functionhead-ID-LPAR-Param_types-RPAR");}

Functionbody: VAR_list STMT_list {Y_DEBUG_PRINT("Functionbody-Varlist-Stmtlist");}

Param_types: VOID {Y_DEBUG_PRINT("Param_types-VOID");}
|Param_types1 {Y_DEBUG_PRINT("Param_types-Param_types1");}

Param_types1: Param_type1 {Y_DEBUG_PRINT("Param_types1-Param_type1");}
| Param_types1 COMMA Param_type1 {Y_DEBUG_PRINT("Param_types1-Param_types1-COMMA-Param_type1");}

Param_type1: Type ID Param_type11 {Y_DEBUG_PRINT("PARAM_type1-Type-ID-Param_type11");}

Param_type11: {Y_DEBUG_PRINT("Param_type11-EMPTY");}
| LBRAC RBRAC {Y_DEBUG_PRINT("Param_type11-LBRAC-RBRAC");}

VAR_list: Type VAR_list2 {Y_DEBUG_PRINT("VAR_list-Type-VAR_list2");}

VAR_list2: var_decl {Y_DEBUG_PRINT("VAR_list2-var_decl");}
| var_decl COMMA VAR_list2 {Y_DEBUG_PRINT("VAR_list2-var_decl-COMMA-VAR_list2");}

var_decl: ID {Y_DEBUG_PRINT("var_decl-ID");}
| ID LBRAC INTCON RBRAC {Y_DEBUG_PRINT("var-decl-ID-LBRAC-INTCON-RBRAC");}

Type: CHAR {Y_DEBUG_PRINT("Type-CHAR");}
|INT {Y_DEBUG_PRINT("Type-INT");}

STMT_list: STMT2 {Y_DEBUG_PRINT("STMT-list-STMT2");}

STMT2: STMT {Y_DEBUG_PRINT("STMT2-STMT");}
| STMT STMT2 {Y_DEBUG_PRINT("STMT2-STMT-STMT2");}

STMT : IF LPAR Expr RPAR STMT {Y_DEBUG_PRINT("IF-LPARN-Expr-RPARN-STMT");}
| IF LPAR Expr RPAR STMT ELSE STMT{Y_DEBUG_PRINT("IF-LPARN-Expr-RPARN-STMT-ELSE-STMT");}
 /*if cats) ERROR*/
| IF Expr RPAR STMT ELSE STMT {warn("STMT-IF: missing LPAR");}
 /*if (cats ERROR*/
| IF LPAR Expr STMT ELSE STMT {warn("STMT-IF: missing RPAR");}
 /*two elses ERROR*/
| IF LPAR Expr STMT ELSE ELSE STMT {warn(":too many elses");}
| WHILE LPAR Expr RPAR STMT{Y_DEBUG_PRINT("STMT-WHILE-LPAR-Expr-RPAR-STMT");}
 /*for(c=0;c<1;c++)*/
| FOR LPAR Assign SEMIC Expr SEMIC Assign RPAR STMT {Y_DEBUG_PRINT("STMT-FOR-LPAR-Assign-SEMIC-Expr-SEMIC-Assign-RPAR-STMT");}
 /*for(;c<1;c++)*/
| FOR LPAR SEMIC Expr SEMIC Assign RPAR STMT {Y_DEBUG_PRINT("STMT-FOR-LPAR-SEMIC-Expr-SEMIC-Assign-RPAR-STMT");}
 /*for(;;c++)*/
| FOR LPAR SEMIC SEMIC Assign RPAR STMT {Y_DEBUG_PRINT("STMT-FOR-LPAR-SEMIC-SEMIC-Assign-RPAR-STMT");}
 /*for(;;)*/
| FOR LPAR SEMIC SEMIC RPAR STMT {Y_DEBUG_PRINT("STMT-FOR-LPAR-SEMIC-SEMIC-RPAR-STMT");}
 /*for(c=0;;)*/
| FOR LPAR Assign SEMIC SEMIC RPAR STMT {Y_DEBUG_PRINT("STMT-FOR-LPAR-Assign-SEMIC-SEMIC-RPAR-STMT");}
 /*for(c=0;c<1;)*/
| FOR LPAR Assign SEMIC Expr SEMIC RPAR STMT {Y_DEBUG_PRINT("STMT-FOR-LPAR-Assign-SEMIC-Expr-Semic-RPAR-STMT");}
 /*for(c=0;;c++)*/
| FOR LPAR Assign SEMIC SEMIC Assign RPAR STMT {Y_DEBUG_PRINT("STMT-FOR-LPAR-Assign-SEMIC-SEMIC-ASSIGN-RPAR-STMT");}
 /*for(;c<1;)*/
| FOR LPAR SEMIC Expr SEMIC RPAR STMT {Y_DEBUG_PRINT("STMT-FOR-LPAR-SEMIC-Expr-SEMIC-RPAR-STMT");}
 /*for() ERROR*/
| FOR LPAR RPAR STMT {warn("STMT-FOR: empty statement");}
 /*for{;;;) ERROR*/
| FOR LPAR SEMIC SEMIC SEMIC RPAR {warn("STMT-FOR: too many semicolons");}
 /*for;;) ERROR*/
| FOR SEMIC SEMIC RPAR STMT {warn("STMT-FOR: missing LPAR");}
 /*for(;; ERROR*/   
| FOR LPAR SEMIC SEMIC STMT {warn("STMT-FOR: missing RPAR");}
| RETURN Expr SEMIC {Y_DEBUG_PRINT("STMT-RETURN-Expr-SEMIC");}
| RETURN SEMIC {Y_DEBUG_PRINT("STMT-RETURN-SEMIC");}
 /*return ERROR*/
| RETURN {warn("STMT-Return:missing semicolon");}
| Assign SEMIC {Y_DEBUG_PRINT("STMT-Assign-SEMIC");}
/*function call*/
| ID LPAR RPAR SEMIC {Y_DEBUG_PRINT("STMT-ID-LPAR-RPAR-SEMIC");}
| ID LPAR Expr Expr2 RPAR SEMIC {Y_DEBUG_PRINT("STMT-ID-LPAR-Expr-Expr2-RPAR-SEMIC");}
 /*No semic ERROR*/
| ID LPAR Expr Expr2 RPAR {warn(":missing semicolon");}  
| LCURL STMT2 RCURL {Y_DEBUG_PRINT("STMT-LCURL-STMT-RCURL");}
| LCURL RCURL {Y_DEBUG_PRINT("STMT-LCURL-RCURL");}
| SEMIC {Y_DEBUG_PRINT("STMT-SEMIC");}

Assign : ID Assign1 EQUAL Expr {Y_DEBUG_PRINT("Assign-1-ID-Assign1-EQUAL-Expr");}
 /*Error no semi*/
| Assign {warn( "Assign: missing semicolon on line");}

Assign1 : {Y_DEBUG_PRINT("Assign1-1-Empty"); }
| LBRAC Expr RBRAC {Y_DEBUG_PRINT("Assign1-2-LBRAC-Expr-RBRAC"); }
| LBRAC Expr error { warn("Assign1: missing RBRAC"); }
| error Expr RBRAC { warn("Assign1: missing LBRAC"); }
| LBRAC error RBRAC { warn("Assign1: Invalid array index"); }

Expr : SUB Expr %prec UMINUS {Y_DEBUG_PRINT("Expr-1-UMINUS Expr"); }
| ABANG Expr {Y_DEBUG_PRINT("Expr-2-ANABG Expr"); }
| Expr Binop Expr {Y_DEBUG_PRINT("Expr-3-Expr-Binop-Expr"); }
| Expr Relop Expr {Y_DEBUG_PRINT("Expr-4-Expr-Binop-Expr"); }
| Expr Logop Expr {Y_DEBUG_PRINT("Expr-5-Expr-Logop-Expr"); }
| ID {Y_DEBUG_PRINT("Expr-6-ID"); }
| ID LPAR RPAR {Y_DEBUG_PRINT("Expr-ID-LPAR-RPAR");} 
| ID LPAR Expr Expr2 RPAR {Y_DEBUG_PRINT("Expr-ID-LPAR-Expr-Expr2-RPAR");}
| ID LBRAC Expr RBRAC {Y_DEBUG_PRINT("Expr-ID-LBRAC-Expr-RBRAC");}
| LPAR Expr RPAR {Y_DEBUG_PRINT("Expr-7-LPARN-Expr-RPARN");} 
| INTCON { Y_DEBUG_PRINT("Expr-8-INTCON"); }
| CHARCON { Y_DEBUG_PRINT("Expr-9-CHARCON"); }
| STRINGCON { Y_DEBUG_PRINT("Expr-10-STRINGCON"); }
| Array {Y_DEBUG_PRINT("Expr-11-Array"); }
| error {warn("Expr: invalid expression "); }

/*top is for no expression 2*/
Expr2: {Y_DEBUG_PRINT("Expr2-Empty");}
| COMMA Expr {Y_DEBUG_PRINT("Expr2-COMMA-Expr");}
 /*recursively looks for another expression in function call (exp1,exp2,exp3,...*/
| COMMA Expr Expr2 {Y_DEBUG_PRINT("Expr2-COMMA-Expr-Expr2");}


Array : 
ID LBRAC Expr RBRAC {Y_DEBUG_PRINT("Array-1-ID-LBRAC-Expr-RBRAC"); }
| ID error RBRAC {warn( "Array: invalid array expression"); }

Binop : ADD {Y_DEBUG_PRINT("Binop-1-ADD"); }
| SUB {Y_DEBUG_PRINT("Binop-2-SUB"); }
| MUL {Y_DEBUG_PRINT("Binop-3-MUL"); }
| DIV {Y_DEBUG_PRINT("Binop-4-DIV"); }

Logop : ANDCOM {Y_DEBUG_PRINT("Logop-1-ANDCOM"); }
| ORCOMP {Y_DEBUG_PRINT("Logop-2-ORCOMP"); }

Relop : EQUALS { Y_DEBUG_PRINT("Relop-1-EQUALS"); }

| NOTEQU { Y_DEBUG_PRINT("Relop-2-NOTEQU"); }

| LESEQU { Y_DEBUG_PRINT("Relop-3-LESEQU"); }

| GREEQU { Y_DEBUG_PRINT("Relop-4-GREEQU"); }

| GREATE { Y_DEBUG_PRINT("Relop-5-GREATE"); }

| LESSTH { Y_DEBUG_PRINT("Relop-6-LESSTH"); }


%%

main()
{
int result = yyparse();
if (lex_state==1) {
yyerror("End of file within a comment");
}
if (lex_state==2) {
yyerror("End of file within a string");
}
return result;
} 
int yywrap(){
return 1;
}
yyerror(const char *s)
{
fprintf(stderr, "%s on line %d\n",s,input_line_no);
} 
warn(char *s)
{
fprintf(stderr, "%s\n", s);
}

Basically, this is supposed to be the lexical analyzer and parser part of a compiler. I'm trying to parse code due to C-- grammar rules. I've tried adding an end of file rule to the Lexer but it just produced the same error as before, so I don't think that's the issue.

Is there any way to tell if the Lexer has passed the right tokens to the parser?

Also, I've been told to check the y.output file in order to see how to fix my shift/reduce errors for my parser, but I looked at the file and don't know how to tell where the errors are.

I know there are probably lots of issues with my Yacc code, but currently I'm just trying to fix this one single error so I can work on the others. Any help would be very much appreciated, thank you.

Your grammar's start rule is Assign and that will not match an empty input. So it's a syntax error. It would also be a syntax error if the input had two assignments. If you tell bison to match an Assign, that is what it will try to do, nothing more and nothing less. To see what the parser is receiving, turn on bison's trace facility. — rici
bison manual: debugging your parser (which will also eliminate the need for all that Y_DEBUG_PRINT stuff.) — rici
@rici Thank you very much, your link is actually really really helpful! I changed my Yacc file so that it reads %start prog and then I added a grammar to prog so that it could just be empty. This seems to fix the error for an empty file, but does not fix the error for a file with just comments. Do you know why that is? Also, how would I go about turning on the trace facility? — Alexia Paskevicius
I imagine it's because your comment start states are non-exclusive, so the content is being matched by an untagged rule. Check the difference between %x and %s in the flex manual. And turn on tracing in your parser so you can tell what's going on. (Flex can only generate debugging output. See the -d command line flag.) — rici
Flex manual debugging section. Also see chapter 10 on start conditions. (Although both comments and strings can easily be matched with regular expressions). — rici

Chris Dodd Chris Dodd · Accepted Answer · 2018-10-22T02:50:07

The problem is that your grammar does not match an empty input, so if you give it any empty input, it will flag it as a synatx error. Your grammar is particularly confused as it doesn't even accept a single declaration or function, it requires there be at least 2.

Normally, you would just use a single top-level program rule that matches zero or more declarations or functions:

prog: /* empty */
    | prog decl SEMIC
    | prog Function
    ;

Syntax error when testing a blank file - LEX/YACC

1 Answers