6
votes

I'm using Flex and Bison for a parser generator, but having problems with the start states in my scanner.

I'm using exclusive rules to deal with commenting, but this grammar doesn't seem to match quoted tokens:

%x COMMENT

//                    { BEGIN(COMMENT); }
<COMMENT>[^\n]        ;
<COMMENT>\n           { BEGIN(INITIAL); }

"=="                  { return EQUALEQUAL; }

.                     ;

In this simple example the line:

// a == b

isn't matched entirely as a comment, unless I include this rule:

<COMMENT>"=="             ;

How do I get round this without having to add all these tokens into my exclusive rules?

3

3 Answers

9
votes

Matching C-style comments in Lex/Flex or whatever is well documented:

in the documentation, as well as various variations around the Internet.

Here is a variation on that found in the Flex documentation:

   <INITIAL>{
     "//"              BEGIN(IN_COMMENT);
     }
     <IN_COMMENT>{
     \n      BEGIN(INITIAL);
     [^\n]+    // eat comment
     "/"       // eat the lone /
     }
2
votes

Try adding a "+" after the [^n] rule. I don't know why the exclusive state is still picking up '==' even in an exclusive state, but apparently it is. Flex will normally match the rule that matches the most text, and adding the "+" will at least make the two rules tie in length. Putting the COMMENT rule first will cause it to be used in case of a tie.

0
votes

The clue is:

The problem is this 'eat comment' rule doesn't seem to match tokens with more than one character

so add a * to match zero or more non-newlines. You want Zero otherwise a empty comment will not match.

%x COMMENT

//                    { BEGIN(COMMENT); }
<COMMENT>[^\n]*        ;
<COMMENT>\n           { BEGIN(INITIAL); }

"=="                  { return EQUALEQUAL; }

.                     ;