2
votes

I'm writing a small program in Flex/Bison to tokenize/parse a query language I created. I was wondering if there is any way to create any keywords in Flex/Bison.

What I mean is: flex breaks down input into a list of tokens but is there a way to create a list of keywords, so that everytime flex sees on of them it will return the word "keyword".

or following is the only way to do this:

"dog"|"cat"     return KEYWORD;

Is there any data-structure that flex/bison can use, so that everytime it sees a member of that data-structure, it will recognize it as a keyword?

Thanks, Sarah

2
Can you explain what is the problem you encounter with the following construct ? "dog"|"cat" return KEYWORD;fjardon
I didn't run into any problem, I was wondering if there is any data structures that could be used.sap

2 Answers

3
votes

I think a better approach will be you handling this with bison, like this:

in flex:

"dog" { return T_DOG; }
"cat" { return T_CAT; }
...

and in bison you put a rule that accept any of those tokens:

keyworks: T_DOG | T_CAT | ... ;

other_rule: keyworks T_ACTION;
1
votes

Trying to read between the lines to figure out what you're actually asking, it seems like what you want is to be able to change the keywords at runtime. If your keywords all follow a common pattern, and other (non keywords) that follow that pattern are all the same token, you can use a hash table or other lookup table. I'm going use a C++ std::map here, but you could use any other data structure that allows lookups:

%{
extern std::map<std::string, int> keyword_table;
%}
%%

[A-Za-z_$][A-Za-z_$0-9]*    { auto k = keyword_table.find(std::string(yytext));
                              if (k != keyword_table.end())
                                  return k->second;
                              return T_IDENTIFIER; }

Now you can stick any identifier with an assoicated token into the keyword_table and the lexer will recognize that keyword and return the corresponding token. Any identifier that is not recognized as a keyword (not in the table) will return a T_IDENTIFIER token.