2
votes

Is there a way to force bison and/or flex to restart scanning after I replace some token with something else?

My particular example would be with replacement for a specific word/string. If I want a word of hello to be replaced by echo hello, how can I get flex or bison to replace hello and then start parsing again (to pick up 2 words instead of just one). So it would be like:

  • Get token WORD (which is a string type)
  • If hello, replace token value with echo hello
  • Restart parsing entire input (which is now echo hello)
  • Get token WORD (echo)
  • Get token WORD (hello)

I've seen very tempting functions like yyrestart(), but I don't really understand what that function in particular really accomplishes. Any help is greatly appreciated, thanks!

Update 4/23/2010

One kind of hack-and-slash solution I've ended up using is for each word that comes through, I check an "alias" array. If the word has an alias, I replace the value of the word (using, for example, strcopy($1,aliasval)), and mark an aliasfound flag.

Once the entire line of input is parsed once, if the aliasfound flag is true, I use yy_scan_string() to switch the buffer state to the input with expanded aliases, and call YYACCEPT.

So then it jumps out to the main function and I call yyparse() again, with the buffer still pointing to my string. This continues until no aliases are found. Once all of my grammar actions are complete, I call yyrestart(stdin) to go back to "normal" mode.

If anyone knows how I can effectively expand my words w/ their alias values, inject into stdin (or some other method), and basically expand all aliases (even nested) as I go, that would be awesome. I was playing around with yypush_buffer_state() and yypop_buffer_state(), along with yy_switch_to_buffer(), but I couldn't get "inline" substitution with continued parsing working...

2

2 Answers

1
votes

It seems to me that the place to fix this is the lexer. I would suggest using flex, which supports a state machine (called "Start Conditions" in the flex documentation). You change states using BEGIN, and the states need to be defined in the definitions section.

So, for example, you could have a rule like

<INITIAL>hello    BEGIN(in_echo); yyless(0); return (WORD_ECHO);
<in_echo>hello    BEGIN(0); return (WORD_HELLO);

yyless() truncates the yytext to the given value, so this puts the entire input back into the stream.

I haven't tried this out myself, but I think this is the structure of the solution you want.

0
votes

Adding an "answer" based on what I ended up doing. Want to mark this question as answered.

Update 4/23/2010

One kind of hack-and-slash solution I've ended up using is for each word that comes through, I check an "alias" array. If the word has an alias, I replace the value of the word (using, for example, strcopy($1,aliasval)), and mark an aliasfound flag.

Once the entire line of input is parsed once, if the aliasfound flag is true, I use yy_scan_string() to switch the buffer state to the input with expanded aliases, and call YYACCEPT.

So then it jumps out to the main function and I call yyparse() again, with the buffer still pointing to my string. This continues until no aliases are found. Once all of my grammar actions are complete, I call yyrestart(stdin) to go back to "normal" mode.

If anyone knows how I can effectively expand my words w/ their alias values, inject into stdin (or some other method), and basically expand all aliases (even nested) as I go, that would be awesome. I was playing around with yypush_buffer_state() and yypop_buffer_state(), along with yy_switch_to_buffer(), but I couldn't get "inline" substitution with continued parsing working...