Is it possible for a Bison rule to expand instead of reducing so that it turns into more tokens? Asked a different way: is it possible to insert extra tokens to be parsed before the next token in the parser input?
Here is an example where I might want this:
Suppose I want a parser that understands three token types. Numbers (just positive integers for the sake of simplicity - INT), words (any number of letters, upper or lower case STRING) and some kind of other symbol (lets use an exclamation mark for no good reason - EXC)
Suppose I have a rule that reduces a word followed by a number followed by an exclamation mark. This rule results in an integer type, let's say for now that it simply doubles its input. This rule also allows itself to be the integer that it parses.
I also have a rule to accept any number of these in a row (the start rule).
The Bison parser look like this: (quicktest.y)
%{
#include <stdio.h>
%}
%union {
int INT_VAL;
}
%token STRING EXC
%token <INT_VAL> INT
%type <INT_VAL> somenumber
%%
start: somenumber {printf ("Result: %d\n", $1);}
| start somenumber {printf ("Result: %d\n", $2);}
;
somenumber: STRING INT EXC {$$ = $2 *2;}
| STRING somenumber EXC {$$ = $2 *2;}
;
%%
main(int argc, char ** argv){
yyparse();
}
yyerror(char* s){
fprintf(stderr, "%s\n", s);
}
The tokens can be generated with a flex lexer like so: (quicktest.l)
%{
#include "quicktest.tab.h"
%}
%%
[A-Za-z]+ {return STRING;}
[1-9]+ {yylval.INT_VAL = atoi(yytext); return INT;}
"!" {return EXC;}
. {}
This can be built with the following commands:
bison -d quicktest.y
flex quicktest.l
gcc -o quicktest quicktest.tab.c lex.yy.c -lfl -ggdb
I can now input something like this:
double double 2 ! !
and get the result 8
Now if I want the user to be able to avoid having lots of exclamation marks on one line, like this:
a b c d e f 2 ! ! ! ! ! !
I'd like to be able to allow them to input something like this:
a b c d e f 2 !*6
So I can add a flex expression for such a token that simply extracts the number of exclamations needed:
!\*[1-9]+ {
char *number = malloc(sizeof(char) * (strlen(yytext)-1));
strcpy(number, yytext+2);
yylval.INT_VAL = atoi(number);
free(number);
printf("Multiple exclamations: %d\n", yylval.INT_VAL);
return REPEAT_EXC;
}
But how would I implement the bison side of things?
I can add the token type like so:
%token <INT_VAL> REPEAT_EXC
And then a rule of some kind perhaps?
repeat_exc: REPEAT_EXC {/*expand into n exclamation marks (EXC tokens)*/}
;
Does Bison support this in any way?
If not how should I implement this?
Should I somehow have the lexer return the EXC token n times when it receives the repeat exc expression? (I'd rather avoid this if possible as this requires the flex code to keep record of some kind of state, it could be in the repeat exclamation state or in a normal state. The lexer is then not as simple to maintain.)