0
votes

I am trying to write a grammatical recognizer using flex and bison to determine if an input string is in L(G), where the language is a union of:

L(G) = {a^i b^j c^k d^l e^m} where i,j,k,l,m > 0 and i=m and k=l

and

L(G) = {e^i d^j c^k b^l a^m} where i,j,k,l,m > 0 and i=2m k=3l and j=2

Right now I have it working fine, but only when using the tokens in the languages. If I include any other token it seems to get ignored and the test passes or fails based on the other allowed tokens. This is problematic because it allows for strings such as "abcdef" to pass the parse even though "f" is not in the language.

The erroneous input that I am testing now is "abcdef". The "abcde" part is correct and gives the correct output, but adding the "f" to the end causes both the syntax error message from yyerror("syntax error"), and the "congratulations; parse succeeded" print statement from main to print.

Using "fabcde" does the same thing I described above. It is giving me the error but it's also giving me the success print statement. I'm using "if(yyparse() == 0))" to print the success statement in main and I'm thinking that might be the culprit here, although I had the same issues when I moved the print statements into the .y file and just used yyparse() and return(1) in main.

Here is my .in file (minus includes):

%%

a return A;

b return B;

c return C;

d return D;

e return E;

. yyerror("syntax error\n\nSorry, Charlie, input string not in L(G)\n"); /* working but still prints success message too */

%%

Here is my .y file (minus includes):

%token A

%token B

%token C

%token D

%token E


%% /* Grammar Rules */

string: as bs cs ds es
{
if(($1 == $5) && ($3 == $4)) {
return(0);
}
else
{
return(-1);
}
}
;

string: es ds cs bs as
{
if(($1 == (2 * $5) && ($3 == (3 * $4)) && ($2 = 2)) {
return(0);
}
else
{
return(-1);
}
}
;


as: A as {$$ = $2 +1;}
;

as: A {$$ = 1;}
;

bs: B bs {$$ = $2 +1;}
;

bs: B {$$ = 1;}
;

cs: C cs {$$ = $2 +1;}
;

cs: C {$$ = 1;}
;

ds: D ds {$$ = $2 +1;}
;

ds: D {$$ = 1;}
;

es: E es {$$ = $2 +1;}
;

es: E {$$ = 1;}
;

%%

my .c file is simple and just returns "congratulations; parse successful" if yyparse() == 0, and "input string is not in L(G)" otherwise.

Everything works perfectly fine when the input strings only include a, b, c, d, and e. I just need to figure out how to make the parser give a syntax error without a success statement if there's any token besides them in the input string.

Here is an image that will help show my issue: The first two parses work as intended. The third one shows my issue.

2
What is the erroneous input? I would expect an input which starts with f, for example, to produce an error return.rici
@rici the erroneous input that I am testing now is "abcdef". The "abcde" part is correct and gives the correct output, but adding the "f" to the end causes both the syntax error message from yyerror("syntax error"), and the "congratulations; parse succeeded" print statement from main to print.DawsonJBailey
Yes, that can easily be explained. Please add those details to your question.rici
@rici Just saw your edit to the comment. Using "fabcde" does the same thing I described in the last comment. It is giving me the error but it's also giving me the success print statement. I'm using "if(yyparse() == 0))" to print the success statement in main and I'm thinking that might be the culprit here, although I had the same issues when I moved the print statements into the .y file and just used yyparse() and return(1) in main. Also thanks for the suggestion, I will add that to my post.DawsonJBailey
My guess is that this is undefined behaviour: return yyerror("syntax error\n\nSorry, Charlie, input string not in L(G)\n"); unless your yyerror function actually returns something (which most of them do not).rici

2 Answers

2
votes

If a (f)lex rule does not return anything, then tokens that it matches will be ignored. This is appropriate for comments, but not for tokens you want to have be errors. If you change your catch-all flex rule to

.    return *yytext;

Then all unrecognized characters in the input (except for newline, which is the only thing . does not match) will be returned, and will likely cause a Syntax error message from your parser (and a failed return from yyparse. If your grammar contains literal character tokens (eg. '#' to match that character), then it will of course match.

0
votes

A bison/yacc generated parser expects to parse an entire correct input, up to and including the end-of-input marker, and only then return a success indication (a return value of 0).

Of course, if the input is syntactically incorrect, the parser may return early with an error indication (which is always the value 1 for syntax errors, and 2 if it runs out of memory). In this case, before the parser returns, it will clean up its internal state and free any allocated memory.

It's important that you let the parser do this. Returning from a semantic action in a bison/yacc parser is at best unwise (since it is almost certainly a memory leak) and can also produce confusion precisely because it may result in successful returns after an error message is produced.

Consider, for example, the case of the input abcdea, which is a valid string followed by an invalid a. It's likely that the semantic action for string will be run before the parser attempts to handle the last a, because of parser table compression (which defers error actions in order to save table entries). But your semantic action actually returns 0, bypassing the parser's error reporting and clean-up. If the input is abcdef and your scanner calls yyerror for the invalid token (which is not a particularly good idea either), then the sequence of actions will be:

  1. Scanner prints an error
  2. Parser executes the string semantic action, which returns 0.

Again, proper error handling and clean-up have been bypassed by the return statement in the semantic action.

So don't do that. If you want to report an error in a semantic action, use YYABORT, which will cleanly terminate the parse with an error return. If your top-level production is correct, on the other hand, do nothing. The parser will then verify that the next input token is the end-of-input marker and return success.