Flex - create a buffer state from a string WITHOUT setting it as the active buffer

Question

You can see there:

The following routines are available for setting up input buffers for scanning in-memory strings instead of files. All of them create a new input buffer for scanning the string, and return a corresponding YY_BUFFER_STATE handle (which you should delete with yy_delete_buffer() when done with it). They also switch to the new buffer using yy_switch_to_buffer(), so the next call to yylex() will start scanning the string.

Then you have yy_scan_string and such

Usually Flex's default settings are fine, I have one file to scan or only need the one scanner however this task is different.

I am trying to create a scanner able to scan multiple files and some strings. It is reentrant and I've never used YY_INPUT() before.

So my question is as follows:

When a buffer is finished and there's nothing left to scan does Flex pop that state off the stack and if there's another state start scanning that how it left off?

By how it left off I mean as if it were mid-way through a rule. suppose I try and match ab, if one state ends with a and another starts with b how is that treated?

Problem

/** Pushes the new state onto the stack. The new state becomes
 *  the current state. This function will allocate the stack
 *  if necessary.
 *  @param new_buffer The new state.
 *  @param yyscanner The scanner object.
 */
void yypush_buffer_state (YY_BUFFER_STATE new_buffer , yyscan_t yyscanner)
{
    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
    if (new_buffer == NULL)
        return;

    yyensure_buffer_stack(yyscanner);

    /* This block is copied from yy_switch_to_buffer. */
    if ( YY_CURRENT_BUFFER )
        {
        /* Flush out information for old buffer. */
        *yyg->yy_c_buf_p = yyg->yy_hold_char;
        YY_CURRENT_BUFFER_LVALUE->yy_buf_pos = yyg->yy_c_buf_p;
        YY_CURRENT_BUFFER_LVALUE->yy_n_chars = yyg->yy_n_chars;
        }

    /* Only push if top exists. Otherwise, replace top. */
    if (YY_CURRENT_BUFFER)
        yyg->yy_buffer_stack_top++;
    YY_CURRENT_BUFFER_LVALUE = new_buffer;

    /* copied from yy_switch_to_buffer. */
    yy_load_buffer_state(yyscanner );
    yyg->yy_did_buffer_switch_on_eof = 1;
}

Pushing null wont work

Old stuff

What I expect

I'd hope that yylex returning 0 means end of buffer (and gives me a chance to manipulate them)

I'd also hope that unless I restart the scanner it will carry on right where it left off (so ab would be matched if one buffer ended with a and another started with b, unless I explicitly restart the scanner)

Secondly, how can I prepare a string buffer without having flex set it as the active buffer.

I suppose I could create a new "null" buffer (see the create buffer function in the link) push that, then push this onto the stack.

When I go to create a buffer from a string I expect that function to set the input buffer, which is the top of the input stack, which will harmlessly delete my "null" buffer.

This seems quite messy for something otherwise so lovely.

Reasons for asking

Testing this would be very difficult. I'd also have to assume I know how flex works correctly then design tests that pass or fail based on how I think it works to see if it really does. This'd take a long time and I could be unlucky in getting results that support my model, even if it is wrong.

Chris Dodd Chris Dodd · Accepted Answer · 2013-11-29T21:41:45

Tokens cannot cross buffer boundaries -- all the characters for a single token must come from the same buffer. It's easiest to think of a buffer as holding a sequence of characters followed by an EOF -- flex will match a token by reading characters from the current buffer, and if no tokens match, will match the default "match any single character as a token and echo it" rule. Your action code for a token may switch buffers, in which case the next token will come from the new buffer. Buffers will never be switched in the middle of a token, and will never be switch "automatically" -- you need an explicit switch somewhere in your code.

When flex gets to the end of a buffer (no characters left except the EOF marker), it will match the <<EOF>> rule (which might switch to a new buffer), or call yywrap (you can think of yywrap as the "default" <<EOF>> rule). The default yywrap doesn't change the buffer and causes yylex to return a 0 token. Since the buffer is unchanged, if you call yylex again without changing buffers, it will do the same thing again.

The fact that yy_scan_string/bytes switch to the new buffer is an annoyance that you get around by saving the current buffer and restoring:

YY_BUFFER_STATE cur = YY_CURRENT_BUFFER;
YY_BUFFER_STATE n = yy_scan_string(str);
yy_switch_to_buffer(cur);

Generally this will be immediately followed by yypush_buffer_state(n).

Flex - create a buffer state from a string WITHOUT setting it as the active buffer

Old stuff

1 Answers