2
votes

I am trying to parse two files with win flex and bison, but I am encountering a problem where lex is not in the state I am expecting. In the lex file:

include[ \t]+\" { BEGIN(include_state); }
<include_state>([^\\\"\n]|\\.)+ {
    yyin = fopen(yytext, "r");
    if (!yyin) {
        printf("Error opening include file: %s\n", yytext);
        return 1;
    }
    yypush_buffer_state(yy_create_buffer(yyin, YY_BUF_SIZE, yyscanner),
        yyscanner);
    BEGIN(INITIAL);
}
<include_state>\"[ \t]*";" { BEGIN(INITIAL); }
<<EOF>> {
    yypop_buffer_state(yyscanner);
    if (!YY_CURRENT_BUFFER)
        yyterminate();
}

The first file being parsed includes the second file as follows:

include "hello.txt";

What happens when parsing is that the second file ("hello.txt") is parsed OK with no problems, but there is a problem when returning to the first file. The quote and semi colon at the end of the line are read, but lex is in the INITIAL state. So lex is not matching on the rule that I'm expecting it to match on. I know this for sure because if I add the following rule (it matches):

<INITIAL>\"[ \t]*";" { printf("Right matching, wrong state.\n"); return 1; }

Why does it not return to the include_state and how can I fix this?

2
I answered this in your previous question: The start condition is not part of the buffer state, so you need to manage it separately. In this case, the simplest solution would be to read the closing " before doing the include. - rici

2 Answers

2
votes

Looks like it goes to INITIAL because that's what you tell it to do after calling yypush_buffer_state(). How is it ever going to match the second <include_state> if you do that? What happens if you delete that state change?

1
votes

The start condition is global. It is not part of the buffer state. Pushing and popping the buffer state do not change it. You have to manage it yourself.

You could restore the start condition to include_state after you do the yypop_buffer_state. You could even keep your own stack of lexer states along side of the stack of buffers. But the simplest solution seems to be reading the closing punctuation in the #include statement before performing the #include, so that you are always in INITIAL state when changing buffers:

<include_state>[^\n"]+\" {
    yytext[yyleng - 1] = 0; // Get rid of the close quote.
    yyin = fopen(yytext, "r");
    if (!yyin) {
        printf("Error opening include file: %s\n", yytext);
        return 1;
    }
    yypush_buffer_state(yy_create_buffer(yyin, YY_BUF_SIZE, yyscanner),
                        yyscanner);
    BEGIN(INITIAL);
}
<include_state>.|\n   { /* Handle syntax error in #include */ }