Implementations of the «syntax» flex using tools flex and bison

Question

I'm having problems with the subject. I need to implement the syntax tools flex. For this I wrote Context-free grammar. Here is a sample of the kind of examples that I should be able to handle:

№1:

Rname 0|1
String1 {Rname}{Rname}*(111)
String2 {Rname}{Rname}*(000)
%%

№2:

%%
Rname   0|1
String1 {Rname}{Rname}*(111)
String2 {Rname}{Rname}*(000)
%%
{String1}   {return 1;}
{String2}   {return 1;}
%%

№3:

Name        [a-zA-Z][a-zA-Z0-9]*
Words       [a-zA-Z0-9]*
%%
{Identifier} {return AP_Name;}
{Words} {return AP_Words;}
%%

The most complex example, which takes into account all of the options:

A "abc" | "cba"
B "qwe"
C 111
D rty* | rty+
E ({C | D})*
%%
{String1} {return 1;}
{String2} {return 1;}

{A} {return AP_A;}
{B} {return AP_B;}
{C} {return AP_C;}
{D} {return AP_D;}
{E} {return AP_E;}
%%

I have a file for flex as follows:

%option noyywrap
%option yylineno
%option never-interactive
%{
#include <stdio.h>
#include "bison.tab.h"
%}

%%
[a-zA-Z][a-zA-Z0-9]*     {return AP_Name;}
[a-zA-Z0-9]*         {return AP_Words;}

\(      {return AP_Bracket_open1;}
\)      {return AP_Bracket_close1;}

\[      {return AP_Bracket_open2;}
\]      {return AP_Bracket_close2;}

\{      {return AP_Bracket_open3;}
\}      {return AP_Bracket_close3;}

\+      {return AP_Plus;}
\*      {return AP_Multiply;}
\ '|'       {return AP_Or;}
'%%'        {return AP_Percentage;}
\;      {return AP_Semicolon;}
\-      {return AP_Dash;}
\"      {return AP_Quote;}
%%

And there is a file for bison as follows:

%{
#include <stdio.h>
extern int yylineno;
void yyerror(char const *msg)
{
    fprintf(stderr, "%d: %s\n", yylineno, msg);
}
int yyparse();
#define YYPRINT(file, type, value) fprintf(file, "%d", value);
%}
%token AP_Name
%token AP_Words
%token AP_Bracket_open1
%token AP_Bracket_close1
%token AP_Bracket_open2
%token AP_Bracket_close2
%token AP_Bracket_open3
%token AP_Bracket_close3
%token AP_Plus
%token AP_Multiply
%token AP_Or
%token AP_Percentage
%token AP_Semicolon
%token AP_Dash
%token AP_Quote
%%
S : Block1 AP_Percentage Block2 AP_Percentage;

Identifier : AP_Name | AP_Words;

Block1 : AP_Name Patern Block1 | ;
Patern : Regex | Regex Patern;

Regex  : Value Plurality /* abc* */ /* abc+ */
       | AP_Bracket_open1 Value AP_Bracket_close1 Plurality  /* [abc] */ /* [abc]* */ /* [abc]+ */
       | AP_Bracket_open2 Value AP_Bracket_close2 Plurality  /* (abc) */ /* (abc)* */ /* (abc)+ */
       | AP_Bracket_open3 Value AP_Bracket_close3 Plurality; /* {abc} */ /* {abc}* */ /* {abc}+ */

Plurality : AP_Plus | AP_Multiply |; /* + * */

Value : Identifier /* abc */
      | AP_Quote Identifier AP_Quote /* "abc" */
      | Identifier AP_Or Identifier /* abc | cba*/
      | Identifier AP_Dash Identifier; /*abc - cba*/

Block2 : AP_Bracket_open3 Identifier AP_Bracket_close3 /* {abc} */
       | AP_Bracket_open3 Identifier AP_Bracket_close3 AP_Bracket_open3 Identifier AP_Bracket_close3 /* {abc}{abc} */
       |
       ;
%%
extern FILE *yyin;
int main()
{
    yydebug=1;
    yyin = fopen("test.txt","r");
    if (yyparse() != 0)
        return 0;
    else
    {
        printf("Success\n");
        return 0;
    }
}

At work I use the following set of parameters:

flex lex.l
bison bison.y -d -t
cc lex.yy.c bison.tab.c bison.tab.h

Now I'm trying to process my first example, and at the end I have the following error:

Reducing stack by rule 4 (line 31):
   $1 = token AP_Name (0)
   $2 = nterm Patern ()
   $3 = nterm Block1 ()
-> $$ = nterm Block1 ()
Stack now 0
Entering state 3
Now at end of input.
1: syntax error
Error: popping nterm Block1 ()
Stack now 0
Cleanup: discarding lookahead token $end (0)
Stack now 0

Do I understand correctly that the bison want to read the end of the file, it see the end of the file and at the same time off with an error that the end of the file read failed? Grammar itself is not yet unfinished (not sure I know how to finish it), but I want it work at least at this stage.
And sorry for my English.

UPD:

I found a rule that resulted in an error - it Block1: AP_Name Patern Block1 |; When I try to rewrite the grammar, I got this:

%token AP_Name
%token AP_Word
%token AP_Percentage
%token AP_Plus
%token AP_Or
%token AP_Multiply
%token AP_Bracket_open1
%token AP_Bracket_close1
%token AP_Bracket_open2
%token AP_Bracket_close2
%token AP_Bracket_open3
%token AP_Bracket_close3
%token AP_Dash
%token AP_Quote
%%
S : Block1 AP_Percentage Block2 AP_Percentage;

Identifier : AP_Word | AP_Name;

Block1 : AP_Name Regex Block1 | ;

Regex : BracketO Regex
      | Identifier AP_Or Regex /* a|b */
      | Identifier Regex
      | Identifier Plurality Regex /* abc+ or abc* */
      | AP_Quote Identifier AP_Quote /* "abc" */
      |
      ;

BracketO : AP_Bracket_open1 Class BracketC Plurality BracketO /* (abc) */
         | AP_Bracket_open2 Class BracketC Plurality BracketO /* [abc] */
         | AP_Bracket_open3 AP_Name AP_Bracket_close3 Plurality BracketO /* {abc} */
         |
         ;

BracketC : AP_Dash Class BracketC /* [a-b] */
         | AP_Or Class BracketC /* [a|b] or (a|b) */
         | AP_Bracket_close1 /* ((abc)) */
         | AP_Bracket_close2; /* [[abc]] */

Plurality : AP_Multiply | AP_Plus | ; /* * + */

Class : Identifier | BracketO; /* ({Rname}) */

Block2 : AP_Bracket_open3 Identifier AP_Bracket_close3 /* {abc} */
       | AP_Bracket_open3 Identifier AP_Bracket_close3 AP_Bracket_open3 Identifier AP_Bracket_close3 /* {abc} {cba} */
       |
       ;
%%

I don’t like it, BUT it works whit 36 conflict shift/reduce. (this is what I wrote in the comment)

@Rein: Yes, there is one shift/reduce conflict, but it is seen as a warning, not an error, so as I understand it should work, but to get a response, will require more iterations. And it shows in the results of debug (bison in the analysis comes to the end of the file) — k4k
You can't ignore shift/reduce conflicts. Try to get rid of it first. Usually best to verify that the grammar leaves are really working first, then move "inwards" step by step, with complete tests in each step. I guess in your case, the conflict causes the wrong branch to be selected, so the parser expects more input than it gets. — Rein

Rein Rein · Accepted Answer · 2014-10-14T13:18:30

No, the EOF read doesn't fail. The problem is that you try to solve too many rules at once. Here's a suggestion which should get you started:

First, in bison.y, comment out or remove all rules and replace them with

S:      AP_Bracket_open1 Identifier AP_Bracket_close1;

Identifier : AP_Name | AP_Words;

Recompile, and test the following input one at a time:

( c )

( 5 )

Both should work! Your parser's last line should say Success!

Next, you ought to verify that your value symbol works, because its rule only contains tokens and already verified symbols (in this case Identifier only). Remove or comment out all rules and replace them with:

S:      Value;

Identifier : AP_Name | AP_Words;

Value : Identifier
      | AP_Quote Identifier AP_Quote
      | Identifier AP_Or Identifier
      | Identifier AP_Dash Identifier
      ;

Recompile, and test at least the following input combinations:

c
5
"c"
"5"
c | 5
c - 5
5 | c
5 - c

Yes, test really all possible combinations, and even more (try inserting spaces and line breaks also).

Now continue with Plurality, Regex, Patern, Block1, Block2 in exactly that order. Sooner or later you'll also find your shift/reduce conflict.

Good luck!

Implementations of the «syntax» flex using tools flex and bison

UPD:

1 Answers