END OF FILE token with flex and bison (only works without it)

Question

OK this is kind of an odd question because what I have here works the way I want it to. What I'm doing is writing a parser for a lambda calculus expression. So an expression can be one of four things:

variable
constant
(expression expression)
(lambda variable.expression)

Now as you can see, the last two expressions have expressions within them. What I was trying to do was determine the overall expression so I can report which type it is. So for example the expression ((lambda x.(f1 x)) 100) is a combination overall. My idea was to return an END token from flex when it reached the end of file. My code looks like this:

overallexpr: combo END { printf(" The overall expression is a combination\n"); } |
         constant END { printf(" The overall expression is a constant\n"); } |
         VARIABLE END { printf(" The overall expression is a variable\n"); } |
         l_expr END { printf(" The overall expression is a lambda expression\n"); }
;

expr: combo | constant | VARIABLE | l_expr
;

combo: LPARENS expr expr RPARENS
;

constant: FUNCTION | NUMBER
;

l_expr: LPARENS LAMBDA VARIABLE DOT expr RPARENS
;

If I put that END token after the four possibilities in overallexpr like combo END instead of just combo, it doesn't work. But the END token is received by the parser. If I print each token as it is read (with variable, function, and number values) it looks like this

LPARENS  LPARENS  LAMBDA  VARIABLE x  DOT  LPARENS  FUNCTION f1  VARIABLE x  RPARENS  RPARENS  NUMBER 100  RPARENS  END Sorry, Charlie

It may be hard to tell but this should work. The combination ends with the RPARENS and there's an END token right after it. But it doesn't evaluate as an overall expression. However if I take out the END tokens it seems to work every time. I always get an overall message printed, even though the productions of overallexpr and expr are exactly the same. The output is identical to the last one except it says "The overall expression is a combination" before the END token. So my question is why? Does bison always just try the earlier productions first? And why would it work without the END but not with it? Especially because you can see the END token right after it says it's a combination. I'm just trying to get a better understanding of how Bison works.

rici rici · Accepted Answer · 2012-11-02T06:13:13

It's a little hard to tell what's going on here without seeing your code (and I don't really want to wade through it, anyway), but I'll hazard a guess: my guess is that you're replacing the standard yylex EOF indication (i.e., returning 0) with your END token. If the bison parser never sees an EOF, it never finishes the parse.

In effect, bison creates a special production all of its own:

__parse__: __start__ $;

parse is the (actually unnamed) production, and __start__ is whatever you've declared as %start (or the first non-terminal, if you don't declare it explicitly). In your case, I suppose it's overallexpr. $ is the symbol conventionally used to indicated the EOF mark.

Now, when do bison parser actions happen? Although in some cases, they can happen where you think they will (i.e. immediately after the last token in the production), they usually don't happen until the parser takes a peek at the following token. It's allowed to do that; that's why it's called an LALR(1) parser: the 1 is the number of future tokens it's allowed to look at before deciding exactly what to do with the ones it's already got. It almost always needs this information, and often works as though it did even if it seems to you and me that it doesn't.

So in all probability, the parser will not actually do the overallexpr reduction -- or, in other words, it won't execute the action associated with the overallexpr rule -- until it convinces itself that the end-of-file marker is the next token.

Now, if you leave your END token out of the rule and the lexer actually returns EOF, then bison do the reduction when it sees the EOF.

END OF FILE token with flex and bison (only works without it)

1 Answers