0
votes

I am using Flex & bison on Linux. I have have the following set up:

// tokens CREATE { return token::CREATE;}
SCHEMA { return token::SCHEMA; }
RECORD { return token::RECORD;}
[_a-zA-Z0-9][_a-zA-Z0-9]* { yylval->strval = strdup(yytext); return TOKEN::NAME;}
...
// rules CREATE SCHEMA NAME ...
CREATE RECORD NAME ... ...

Everything worked just fine. But if users enter: "create schema record ..." (where 'record' is the name of the schema to be created), Flex will report an error since it matches 'record' as a token and it is looking for the rule "CREATE SCHEMA RECORD". I understand that keywords can be escaped, but that makes user experiences awkward. My question is:

"How can I design the above rules so that it accepts 'create schema record ...' and matches this input to 'CREATE SCHEMA NAME ...'?"

Thanks!

2

2 Answers

4
votes

"Semi-reserved" words are common in languages which have a lot of reserved words. (Even modern C++ has a couple of these: override and final.) But they create some difficulties for traditional scanners, which generally assume that a keyword is a keyword.

The lemon parser generator, which not coincidentally was designed for parsing SQL, has a useful "fallback" feature, where a token which is not valid in context can be substituted by another token (without changing the semantic value). Unfortunately, bison does not implement this feature, and nor does any other parser generator I know of. However, in many cases it is possible to implement the feature in Bison grammars. For example, in the simple case presented here, we can substitute:

create_statement: CREATE RECORD NAME ...
                | CREATE SCHEMA NAME ...

with:

create_statement: CREATE RECORD name
                | CREATE SCHEMA name
name: NAME
    | CREATE
    | RECORD
    | SCHEMA
    | ...

Obviously, care needs to be taken that the (semi-)keywords in the list of alternatives for name are not valid in the context in which name is used. This may require the definition of a variety of name productions, valid for different contexts. (This is where lemon-style fallbacks are more convenient.)

If you do this, it is important that the semantic values of the keywords be correctly set up, either by the scanner or by the reduction rule of the name non-terminal. If there is only one name non-terminal, it is probably more efficient to do it in the reduction actions (because it avoids unnecessary allocation and deallocation of strings, where the deallocation will complicate the other grammar rules in which the keywords appear), so that the name rule would actually look like this:

name: NAME
    | CREATE   { $$ = strdup("CREATE"); }
    | RECORD   { $$ = strdup("RECORD"); }
    | SCHEMA   { $$ = strdup("SCHEMA"); }
    | ...

There are, of course, many other possible ways to deal with the semantic value issue.

0
votes

You shouldn't do this, for the same reason you can't have a variable in C++ named for, while, or class. But if you really want to, look into Start Conditions (it'll be messy).