0
votes

I'm trying to implement a simple lexer and parser using flex-bison.

All I wanted is parse these :

  • a
  • a,b
  • a ,b
  • a, b
  • a,b,c
  • a,b , c
  • ....

Just a sequence separated with comma, may or may not contain space. So here is my grammar :

KEY_SET             : KEY
                      {
                        printf("keyset 1");
                      }
                      | KEY COMMA KEY_SET
                      {
                        printf("keyset 2");
                      };

Declared KEY, COMMA as token.//%token

But it gives me Syntax Error, whenever I press enter or any whitespace.

So I even declared IGNORE [ \t\n] in flex. And in parser I added a new rule :

IGNORE_BLOCK        : IGNORE
                      {
                        printf("\n...ignoring...\n")
                      };

But this doesn't even come to play.

It keeps me giving Syntax Error.

How can I resolve this ?

Lexer :

%{
    #include "y.tab.h"
%}
%option noyywrap
COMMA                   [,]
KEY                     [[:alpha:][:alnum:]*]
IGNORE                  [ \t\n]
%%
{COMMA}                 {return COMMA;}
{KEY}                   {return KEY;}
{IGNORE}                {return IGNORE;}
.                       {printf("Exiting...\n");exit(0);}
%%

Parser :

%{
    #include<stdio.h>
    void yyerror (char const *s);
    int yywrap();
    //int extern yylex();
%}
%token      COMMA
%token      KEY
%token      IGNORE
%%
KEY_SET             : KEY
                      {
                        printf("keyset 1");
                      }
                      | KEY COMMA KEY_SET
                      {
                        printf("keyset 2");
                      };

IGNORE_BLOCK        : IGNORE
                      {
                        printf("\n...ignoring...\n")
                      };

%%
int main(int argc, char **argv)
{
    while(1)
    {
      printf("****************\n");
      yyparse();
      char ign;
      scanf("%c",&ign);
    }
    return 0;
}
int yywrap()
{
   return 1;
}
void yyerror (char const *s) {
   fprintf (stderr, "%s\n", s);
}

Command I'm using to build :

flex test.l
bison -dy test.y
gcc lex.yy.c y.tab.c -o test.exe
1
You'll have to include your flex file in your question. That's almost certainly where your problem is.rici
@rici I have added all codeMaifee Ul Asad

1 Answers

3
votes

Your flex file contains a series of rules, each consisting of a pattern and an action. Contrary to popular belief, you do not need to "declare" your patterns before using them.

If you want to ignore whitespace in your lexer, you need a rule which does nothing.

You had an error in your key pattern, which I fixed; your pattern would not have accepted keys with more than one letter. Also, it is very bad style to call exit in your scanner. Let the parser deal with errors.

%{
    #include "y.tab.h"
%}
%option noyywrap
%%
   /* Removed the COMMA rule. See text below. */
   /* ","               {return COMMA;} */
   /* Compare this pattern with the one you used */
[[:alpha:]][[:alnum:]]* {return KEY;}
   /* Recognise and ignore whitespace. */
[[:space:]]+            ; /* Do nothing */
   /* Send unrecognised input to the parser. */
.                       {return *yytext;}

Your parser does not need IGNORE, which was pointless anyway because the grammar does not produce it. Bison probably warned you about that.

You can simplify your parser in some other ways:

  • yywrap is not needed, since your lexer has %option noyywrap.
  • The COMMA terminal can be written as ',' if you just remove the "," pattern from the lexer (since the fallback rule
    .  { return *yytext; }
    
    will work correctly for any single-character literal).

For testing, you probably want to parse one line at a time instead of ignoring syntax errors.

I'd also recommend not using the "legacy" flag -y when you invoke bison; that flag should only be used on old existing yacc grammar files, since it may interfere with modern bison features. Without -y, bison will write the generated C code to filename.tab.c and the generated header to filename.tab.h. If you don't like those names, you can use the -o flag to specify the name of the generated C code (and the header will have the same name, with the extension changed to .h).

That might produce something like this:

(Note that I changed KEY_SET to key_set because the usual style in grammars is that ALL_CAPS are tokens, while non-terminals are lower-case. I also changed it from right-recursive to left-recursive to avoid a problem you would notice if your production action printed the value of the KEY token, assuming your lexer had given it a value.)

file parser.y

%{
    #include<stdio.h>
    void yyerror (char const *s);
    int yylex(void);
    /* Defined in the flex file */
    void set_input(const char* input);
%}
%token  KEY
%%
key_set : KEY             { printf("keyset 1\n"); }
        | key_set ',' KEY { printf("keyset 2\n"); };
%%
int main(int argc, char **argv)
{
    char buffer[BUFSIZ];
    while (1)
    {
      printf("****************\n");
      char* input = fgets(buffer, sizeof buffer, stdin);
      if (buffer == NULL) break;
      set_input(input);
      yyparse();
    }
    return 0;
}

void yyerror (char const *s) {
   fprintf (stderr, "%s\n", s);
}

file lexer.l

%{
    #include "parser.tab.h"
%}
%option noinput nounput nodefault yylineno
%option noyywrap
%%
[[:alpha:]][[:alnum:]]* {return KEY;}
[[:space:]]+            ; /* Do nothing */
.                       {return *yytext;}
%%
static YY_BUFFER_STATE flex_buffer;
void set_input(const char* input) {
  yy_delete_buffer(flex_buffer);
  flex_buffer = yy_scan_string(input);
}

Build procedure:

flex lexer.l
bison -d parser.y
gcc lex.yy.c parser.tab.c -o parser.exe

Your grammar does not allow empty input, but that's fine. For testing purposes, though, you might want to add a test in the loop which reads input lines to only call the parser if the line is not empty.