1
votes

I'm writing a parser in Bison for a basic compiler (and then expand it to contain subroutines and dynamic memory allocation). The grammar is defined in Appendix A of the dragon book. My Flex scanner works--I ran my test files through it and it printed out all the correct tokens it found. Sorry about the strange formatting below. My bison and flex modes in emacs are a little haywire so I'm just using C mode until I fix.

%{
#include <stdio.h>

#define YYERROR_VERBOSE 1066

  extern FILE* yyin ;
  extern int yylineno ;
  extern char* yytext ;
  extern int yylex() ;
  extern void yyerror() ;
  int YYDEBUG = 1 ;

%}

/* Tokens */
%token AND BASIC BREAK DO ELSE EQ FALSE
%token GREQ ID IF INDEX LEEQ MINUS NOEQ NUM OR REAL TEMP TRUE WHILE

/* Grammar rules from Appendix A */
%%
program: block { printf( "Matched program\n" ) ; }
;

block: '{' decls stmts '}' { printf( " Matched block\n" ) ; }
;

decls: decls decl |
;

decl: type ID ';'
;

type: type '[' NUM ']' | BASIC
;

stmts: stmts stmt |
;

stmt: loc '=' bool ';'
| IF '(' bool ')' stmt
| IF '(' bool ')' stmt ELSE stmt
| WHILE '(' bool ')' stmt
| DO stmt WHILE '(' bool ')' ';'
| BREAK ';'
| block
;

loc: loc '[' bool ']' | ID
;

bool: bool OR join | join
;

join: join AND equality | equality
;

equality: equality EQ rel | equality NOEQ rel | rel
;

rel: expr '<' expr | expr LEEQ expr | expr GREQ expr | expr '>' expr | expr
;

expr: expr '+' term | expr '-' term | term
;

term: term '*' unary | term '/' unary | unary
;

unary: '!' unary | '-' unary | factor
;

factor: '(' bool ')' | loc | NUM | REAL | TRUE | FALSE
;

%%

/*
 * Additional C Code
 * Main Routine
 * yyerror()
 */
int main( int argc, char *argv[] ) {

  int i ;

  if ( argc < 2 ) {
    fprintf( stderr, "No input files.\n\n" ) ;
    return 1 ;
  }

  for ( i = 0 ; i < argc ; i++ ) {

    yyin = fopen( argv[i], "r" ) ;

    if ( !yyin ) {
      fprintf( stderr, "Error opening file: %s.\n\n", argv[i] ) ;
      return 1 ;
    }

    yyparse() ;
  }
  return 0 ; 
}

void yyerror( char *s ) {

  /* fprintf( stderr, "Error parsing - %d: %s at %s\n", yylineno, s, yytext ) ; */
  fprintf( stderr, "Error parsing - %d: %s at %s\n", yylineno, s, yytext ) ;

}

I feel like I might be missing something important. I don't think it's the rules. I set yyin to be the input files provided in argv[]. The errors are

Error parsing - 1: syntax error, unexpected TRUE, expecting '{' at

Error parsing - 1: syntax error, unexpected FALSE, expecting '{' at ELF

Any help would be greatly appreciated!

EDIT: If I change the main function to not set yyin (so yyparse just reads from stdin), I get this:

{ int x; }

Error parsing - 1: syntax error, unexpected TRUE, expecting '{' at {

I don't understand how that is wrong...

2
Please show the relevant part of the input file.JSBձոգչ
The error occurs at the very beginning of input. I get those same errors with every input file, some of which are as simple as { int x; } or { do (x) while (true); } Which is syntactically correct but not semantically.Kizaru
How did that compile with retrun 0? Is this your exact code?jamesdlin
Not sure I'd use "FALSE" like that. Won't bison #define FALSE to something like 264? (Assuming you're using bison -d.) Any situation where FALSE is, well, not false, seems like asking for trouble.bstpierre
(I stand by that statement for TRUE too. Too common a name, you'll get in trouble with it.)bstpierre

2 Answers

2
votes

When I run your sample input above using a stub yylex, the input program matches. I'm making the assumption that "int" tokenizes as BASIC. (You also need to fix "retrun".)

You need to debug your lexer. Either attach a debugger so you can see what it is returning, or put a print statement at the end of yylex.

This replaces everything in the bottom section:

%%
FILE* yyin = NULL;
int yylineno = 0;
char* yytext = NULL;
int main()
{
  yyparse() ;
  return 0 ; 
}

void yyerror( char *s )
{
  fprintf( stderr, "Error parsing - %d: %s at %s\n", yylineno, s, yytext ) ;
}

int yylex()
{
    static int i = 0;
    static int tokens[] = { '{', BASIC, ID, ';', '}' };

    int tok = tokens[i];
    yylineno++;
    i++;
    return tok;
}
0
votes

Without seeing the output of your tokenizer on the same input, it's hard to say where the parser is failing.

Myself, I tokenize everything, turning '{' and '}' into LC and RC, etc., to ensure that the only thing that comes out of the tokenizer are strict processed tokens. That makes it easier to decide where every single character of your sourcefile is being handled.

If you move each token of your sourcefile to a separate line, such as

{
int
x
;
}

what error is reported?