I'm dealing with a tricky problem in my flex/bison lexer/parser.
Here are some flex rules, for roman numerals and arbitrary identifiers:
"I"|"II"|"III"|"IV"|"V"|"VI"|"VII"|"i"|"ii"|"iii"|"iv"|"v"|"vi"|"vii" { return NUMERAL; }
"foobar" { return FOOBAR; }
[A-Za-z0-9_]+ { return IDENTIFIER; }
Now, consider this simple grammar:
%token <numeral> NUMERAL
%token <foobar> FOOBAR
%token <identifier> IDENTIFIER
program
: numeral foobar { }
;
Finally, here is an example input:
IVfoobar
I intend for this to lex as the numeral IV, followed by a FOOBAR. However, how can I prevent this from lexing as the numeral I followed by the identifier "Vfoobar", or just identifier "IVfoobar", which are both invalid?
IVfoobar
an invalid identifier? Or to put it another way, what exactly is a valid identifier? – rici