2
votes

I'm using ANTLR4 to try to implement a language supporting include files, like PHP's include.

var a = 4 + 5;       // line a
include "some.inc";  // include statement 
var b = 9 * 9;       // line b

Contents of some.inc:

a *= 2;
a +== 3; // Typo here (extranous equals sign)

I need to parse the tree such that the contents of some.inc are inserted into the location of the include statement.

How do I do that in ANTLR4?

I could, of course, build a new string an do some concatenation (e.g. lineA + getContentsOf("some.inc") + lineB) and then pass it to the lexer, but I'm afraid that line and column numbers get messed up, so I'd rather preserve the source path, line and column.


Edit: I want to warn the author of a piece of code in the target language if he made a mistake in his code. In the example above, the author made a typo. I want to warn the user that there is an error on line 2 of some.inc. If the includes are resolved (i.e. replaced) before the whole input was passed to the lexer, then the input stream would look like this:

var a = 4 + 5;       // line a
a *= 2;
a +== 3; // Typo here (extranous equals sign)
var b = 9 * 9;       // line b

The parser would not know that the malformed expression a +== 3 originally came from line 2 of some.inc, thus reporting the wrong position.

My current code looks like this:

CharStream cs = CharStreams.fromPath(mySourceCode);
MyLexer lexer = new MyLexer(cs);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
MyParser parser = new MyParser(tokenStream);
System.out.println(parser.startRule());
1
If you don't want your include to work like a preprocessor-based include (which I assume you don't since you've mentioned PHP's include, not C's #include), you definitely shouldn't do this at the source or token level. What are you doing after parsing the source? Generating byte code? Evaluating the AST directly in a visitor? Is there a reason why you can't resolve includes at that stage rather than during parsing? - sepp2k
@sepp2k I do not know much about C's include directive, so I can't say whether it is to work like a preprocessor-based include or not. However, the purpose is if there are problems within the included file (some.inc in my case), I want the user to see the line and column numbers where the problem is. If this would work in an earlier stage, that'd be okay for me. - MC Emperor
"If it would work in an earlier stage, that'd be okay for me." Did you mean later stage? Because that's what I'd suggest: Just parse an include statement as an include statement without doing anything special, then parse and execute the included file only when the include statement is reached in the interpreter (assuming you are writing an interpreter). - sepp2k
@sepp2k I mean "later" indeed. But it's not entirely clear to me how I would achieve that. With a listener or visitor I suppose? - MC Emperor
The specifics depend on what your current code looks like. Like, without supporting includes, how are you interpreting and/or compiling your language currently - what are you doing after parsing? - sepp2k

1 Answers

0
votes

Since no one has given given an answer, let's get the ball rolling....

In the past, when I had problems like like there two solutions come to to mind:

1. C preprocessor

The C preprocessor that comes with a C compler, such as gcc or clang (and usually called cpp can be used:

    /* In file mygrammar.g4 */
    var a = 4 + 5;       // line a
    #include "some.inc"  // include statement. Note: no ";"
    var b = 9 * 9;       // line b

To process run:

cpp /tmp/mygrammar.g4 | grep -v ^# > /tmp/mygrammar-cpp.g4

The grep is needed to remove the line-number directives. Without that the output would look like:

$ cpp /tmp/mygrammar.g4 
# 1 "/tmp/mygrammar.g4"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "/tmp/mygrammar.g4"

    var a = 4 + 5;
# 1 "/tmp/some.inc" 1
...

2. m4 macro preprocessor

On POSIX systems there is often installed m4 which is a macro processor that handles includes and other kinds of macros.

Here is an example:

/* In file mygrammar.g4 */
var a = 4 + 5;
include(`some.inc') // Note `..' to list include filename. Again, no semicolon.
var b = 9 * 9;

And to run:

$ m4 /tmp/mygrammar.g4 > /tmp/mygrammar-m4.g4