The ANTLR website describes two approaches to implementing "include" directives. The first approach is to recognize the directive in the lexer and include the file lexically (by pushing the CharStream onto a stack and replacing it with one that reads the new file); the second is to recognize the directive in the parser, launch a sub-parser to parse the new file, and splice in the AST generated by the sub-parser. Neither of these are quite what I need.
In the language I'm parsing, recognizing the directive in the lexer is impractical for a few reasons:
- There is no self-contained character pattern that always means "this is an include directive". For example,
Include "foo";
at top level is an include directive, but inArray bar --> Include "foo";
orConstant Include "foo";
the wordInclude
is an identifier. - The name of the file to include may be given as a string or as a constant identifier, and such constants can be defined with arbitrarily complex expressions.
So I want to trigger the inclusion from the parser. But to perform the inclusion, I can't launch a sub-parser and splice the AST together; I have to splice the tokens. It's legal for a block to begin with {
in the main file and be terminated by }
in the included file. A file included inside a function can even close the function definition and start a new one.
It seems like I'll need something like the first approach but at the level of TokenStreams instead of CharStreams. Is that a viable approach? How much state would I need to keep on the stack, and how would I make the parser switch back to the original token stream instead of terminating when it hits EOF? Or is there a better way to handle this?
==========
Here's an example of the language, demonstrating that blocks opened in the main file can be closed in the included file (and vice versa). Note that the #
before Include
is required when the directive is inside a function, but optional outside.
main.inf:
[ Main; print "This is Main!"; if (0) { #include "other.h"; print "This is OtherFunction!"; ];
other.h:
} ! end if ]; ! end Main [ OtherFunction;
Constant
. The definition given toConstant
has to be a compile-time constant, so no function calls. The language also has no text operators, so no concatenation, but it can duplicate another constant:Constant FOO "file.h"; Constant BAR FOO; Include BAR;
– Jesse McGrewConstant
directive in general can have arbitrarily complex expressions since it's usually used with numbers:Constant FOO (BAR + 5 * BAZ);
etc., so handlingConstant
in the lexer is impractical. – Jesse McGrew