0
votes

i have finished my lex file and start to learn about yacc but i have some question about part of my code of lex:

%{
#include "y.tab.h"
int num_lines = 1;
int comment_mode=0;
int stack =0;
%}
digit ([0-9])
integer ({digit}+)
float_num ({digit}+\.{digit}+)
%%
{integer} {  //deal with integer 
                printf("#%d: NUM:",num_lines); ECHO;printf("\n");
                yylval.Integer = atoi(yytext);
                return INT;
               }
{float_num} {// deal with float
                 printf("#%d: NUM:",num_lines);ECHO;printf("\n");
                 yylval.Float = atof(yytext);
                 return FLOAT;
                 }
\n         { ++num_lines; }
.          if(strcmp(yytext," "))ECHO;
%%
int yywrap() {
return 1;
}

every time i got an integer or a float i return the token and save it into yylval and here is my code in parser.y:

%{
#include <stdio.h>
#define YYDEBUG 1  
void yyerror (char const *s) {
fprintf (stderr, "%s\n", s);
}
%}
%union{
int Integer;
float Float;
}
%token <int>INT;
%token <float>FLOAT;
%%
statement :
         INT  {printf("int yacc\n");}
      | FLOAT {printf("float yacc\n");}
      |
      ;
%%
int main(int argc, char** argv)
{
yyparse();
return 0;
}

which compiled by
byacc –d parser.y

lex lex.l

gcc lex.yy.c y.tab.c –ll

since i just want to try something easy to get started, i want to see if i can parse only int and float number first, i print them in both .l and .y file after i input an integer or a float.int the begining i input fisrt random number, for example 123 , then my program print :

1: NUM: 123

in yylex() and

"int yacc\n"

in parser.y
but if i input the second else number, it shows syntax error and the program shutdown i dont know where is the problem. is there any solution?

1

1 Answers

1
votes

Your grammar only accepts a single token, either an INT or a FLOAT. So it will only accept a single number, which is why it produces a syntax error when it reads the second number; it is expecting an end-of-file.

The solution is to change the grammar so that it accepts any number of "statements":

program: /* EMPTY */
       | program statement
       ;

Two notes:

1) You don't need an (expensive) strcmp in your lexer. Just do this:

" "    /* Do nothing */;
.      { return yytext[0]; }

It's better to return the unknown character to the parser, which will produce a syntax error if the character doesn't correspond to any token type (as in your simple grammar) than to just echo the character to stdout, which will prove confusing. Some people would prefer to produce an error message in the lexer for invalid input, but while you are developing a grammar I think it is easier to just pass through the characters, because that lets you add operators to your parser without regenerating the lexer.

2) When you specify %types in bison, you use the tagname from the union, not the C type. Some (but not all) versions of bison let you get away with using the C type if it is a simple type, but you can't count on it; it's not posix standard and it may well break if you use an older or newer version of bison. (For example, it won't work with bison 3.0.) So you should write, for example:

%union{
  int Integer;
  float Float;
}
%token <Integer>INT;
%token <Float>FLOAT;