I'm building a parser for a language containing preprocessor instructions in special preprocessor sections (enclosed by { and }). One of them is similar to the C #define.
I'd like to lex the file in one run using an island grammar for the preprocessor parts.
When I hit the #define instruction, I'd like to include another island grammar which contains all the tokens (approx. 200) of the "regular" part, except the preprocessing region start token and emits the tokens on a different channel and of course has a stop token which returns to the preprocessor island grammar. It is not really vital that the preprocessor region starting token { is really removed since the files I parse are valid, but would be nice.
Is there a way to "reuse" the lexer rules for two modes (I can emit to a named non-const channel which value I could change upon entering/leaving the island)?
Here's some sample source file:
int a = 42;
{ // start preprocessor section
// simple single line #define
#define ABC 42
// will be fix "2 * 42" even if ABS is changed later on
#define DEF 2 * ABC
// multiple line define (all but last line needs to have a "\" before the newline
#define GHI 3 \
+ 4
// the definition can contain (almost) arbitrary code, except line comments, preprocessor sections and preprocessor statements
#define JKL if (a > 23) then b = c + d; str = "} <- this must not be the end of the preprocessor section"; end_if;
} // end preprocessor section