2
votes

Working on an assignment that requires the use of flex and bison to check if a given string is in a language. This comes only after seeing basic examples where flex was used to basically spit back what was read in.

As an example, a string in the format {a^n b^n c^n} with n > 0 would be in the language. So the string aabbcc would be valid.

Is there a way for flex to count specific characters read? As an example, if the string aaabbbcc where given, it would count 3 a's, 3 b's, and 3 c's? Or should I simply use flex to check that the input string is simply in the proper format and then use grammar rules in bison to check if it in the language?

EDIT
I've been working on it for a while and seem to have a semi working version. The problems I'm encountering now are that yyparse() never seems to exit and when an invalid string is given, it encounters a syntax error. As an example:
The string "aabbcc" should be in what I'm calling L2. I will get the following output:

grammar recognizer result:  
L2 recognized 

Then it just stops and never completes. Additionally, the string "hjnkll," should not be recognized but I only get syntax error when entering something like that.
Flex

...  
A [a]*
B [b]*
C [c]*
D [d]*
E [e]*
Z [^a-e\n][.]*
%%
{A} {a = yyleng;} return A;
{B} {b = yyleng;} return B;
{C} {c = yyleng;} return C;
{D} {d = yyleng;} return D;
{E} {e = yyleng;} return E;
{Z} {z = yyleng;} return Z;
\n return NL;
%%

Bison Snipets

%token A
%token B
%token C
%token D
%token E
%token Z
%token NL

%%
/*grammer rules*/
transunit: L1 | L2 | L5 | L6
    {
    printf("\n*****Congratulations; Parse Successful*****\n");
    }
;
L2: A B C NL
    {
    /*Language 2 : L(G2) = {a^nb^nc^n} with n > 0*/
    if(a==b && b==c && d==0 && e==0 && a!=0 && z==0){
        printf("\nLG2 recognized\n");
    } else {
        printf("\nSorry language not recognized\n");
    }
    }
;
/*None of the above : not recognized*/
L6: Z NL
    {
    printf("\nSorry language not recognized\n");
    }
;
3

3 Answers

2
votes

Just for completeness, it is simple to add count checks to bison. Here's a simple example (which recognizes or complains about input lines, so as to make it easier to play with):

I left out most of the boilerplate stuff, including a main and yyerror function. But I did put in some flex options. Using 'a' to mean "any number of as" is stretching the boundaries; it would have been better to have defined tokens called As, Bs and Cs, and then have split the flex rules into five different rules, etc., etc. This was shorter. :)

bison

%{
#include <stdio.h>
%}
%%
start: line
     | start line
     ;

line: 'a' 'b' 'c' '\n' { if ($1 == $2 && $2 == $3)
                           fputs("Good!\n", stderr);
                         else
                           fputs("Bad!\n", stderr);
                       }

flex:

%{
extern int yylval;
%}
%option noyywrap noinput nounput    
%%
"a"+|"b"+|"c"+|.|\n { yylval = yyleng; return *yytext; }
1
votes

The string anbncn is neither regular nor context free. The string can only be generated by a Context Sensitive Grammar or an Unrestricted Grammar depending on the value of n. So the string can not be checked using Flex or Bison. Flex is a scanner generator and it can only check for regular expressions. Bison is a parser generator and it can only check for Context Sensitive Languages.

1
votes

For your specific example, you can match a+b+c+ using states. Afterwards, simply check the number of occurences of a's, b's, and c's and make sure they're all equal. However, this solution quickly grows exponentially complex if your language has more than 2 or 3 rules.

%{
int a = 0, b = 0, c = 0;
%}
%x Bstate Cstate
%%
"a"+                { a = yyleng; BEGIN(Bstate);           }
.                   { /* not in language; error! */        }
<Bstate>"b"+        { b = yyleng; BEGIN(Cstate);           }
<Bstate>.           { /* not in language; error! */        }
<Cstate>"c"+        { c = yyleng; 
                      if(a != b || a != c) { 
                          /* not in language; error! */
                      }
                      BEGIN(INITIAL);                      }
<Cstate>.           { /* not in language; error! */        }
%%

Conversely, you could just tell flex to match "a"+"b"+"c"+ then iterate over the string character by character, but this makes using flex overkill.

See here for more information on states: http://aquamentus.com/flex_bison.html#17`