2
votes

The ANTLR website describes two approaches to implementing "include" directives. The first approach is to recognize the directive in the lexer and include the file lexically (by pushing the CharStream onto a stack and replacing it with one that reads the new file); the second is to recognize the directive in the parser, launch a sub-parser to parse the new file, and splice in the AST generated by the sub-parser. Neither of these are quite what I need.

In the language I'm parsing, recognizing the directive in the lexer is impractical for a few reasons:

  1. There is no self-contained character pattern that always means "this is an include directive". For example, Include "foo"; at top level is an include directive, but in Array bar --> Include "foo"; or Constant Include "foo"; the word Include is an identifier.
  2. The name of the file to include may be given as a string or as a constant identifier, and such constants can be defined with arbitrarily complex expressions.

So I want to trigger the inclusion from the parser. But to perform the inclusion, I can't launch a sub-parser and splice the AST together; I have to splice the tokens. It's legal for a block to begin with { in the main file and be terminated by } in the included file. A file included inside a function can even close the function definition and start a new one.

It seems like I'll need something like the first approach but at the level of TokenStreams instead of CharStreams. Is that a viable approach? How much state would I need to keep on the stack, and how would I make the parser switch back to the original token stream instead of terminating when it hits EOF? Or is there a better way to handle this?

==========

Here's an example of the language, demonstrating that blocks opened in the main file can be closed in the included file (and vice versa). Note that the # before Include is required when the directive is inside a function, but optional outside.

main.inf:

[ Main;
  print "This is Main!";
  if (0) {
  #include "other.h";
  print "This is OtherFunction!";
];

other.h:

  } ! end if
];  ! end Main

[ OtherFunction;
1
Is this a language designed for masochists?Damien_The_Unbeliever
Heh. This is Inform, a language for writing text adventure games. The English-like Inform 7 (inform7.com) is an example of the elegant and awesome things you can do when your design isn't constrained by traditional context-free parsing tools. Unfortunately, I'm parsing Inform 6, which is an example of the unspeakably awful things you can do when your design isn't constrained by traditional context-free parsing tools.Jesse McGrew
@BartKiers Luckily, no: the filename can only be a quoted string or a single identifier that was defined earlier with Constant. The definition given to Constant has to be a compile-time constant, so no function calls. The language also has no text operators, so no concatenation, but it can duplicate another constant: Constant FOO "file.h"; Constant BAR FOO; Include BAR;Jesse McGrew
But the Constant directive in general can have arbitrarily complex expressions since it's usually used with numbers: Constant FOO (BAR + 5 * BAZ); etc., so handling Constant in the lexer is impractical.Jesse McGrew

1 Answers

2
votes

A possibility is for each Include statement to let your parser create a new instance of your lexer and insert these new tokens the lexer creates at the index the parser is currently at (see the insertTokens(...) method in the parser's @members block.).

Here's a quick demo:

Inform6.g

grammar Inform6;

options {
  output=AST;
}

tokens {
  STATS;
  F_DECL;
  F_CALL;
  EXPRS;
}

@parser::header {
  import java.util.Map;
  import java.util.HashMap;
}

@parser::members {
  private Map<String, String> memory = new HashMap<String, String>(); 

  private void putInMemory(String key, String str) {
    String value;
    if(str.startsWith("\"")) {
      value = str.substring(1, str.length() - 1);
    }
    else {
      value = memory.get(str);
    }
    memory.put(key, value);
  }

  private void insertTokens(String fileName) {
    // possibly strip quotes from `fileName` in case it's a Str-token
    try {
      CommonTokenStream thatStream = new CommonTokenStream(new Inform6Lexer(new ANTLRFileStream(fileName)));
      thatStream.fill();
      List extraTokens = thatStream.getTokens();
      extraTokens.remove(extraTokens.size() - 1); // remove EOF
      CommonTokenStream thisStream = (CommonTokenStream)this.getTokenStream();
      thisStream.getTokens().addAll(thisStream.index(), extraTokens);
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}

parse
 : stats EOF -> stats
 ;

stats
 : stat* -> ^(STATS stat*)
 ;

stat
 : function_decl
 | function_call
 | include
 | constant
 | if_stat
 ;

if_stat
 : If '(' expr ')' '{' stats '}' -> ^(If expr stats)
 ;

function_decl
 : '[' id ';' stats ']' ';' -> ^(F_DECL id stats)
 ;

function_call
 : Id exprs ';' -> ^(F_CALL Id exprs)
 ;

include
 : Include Str ';' {insertTokens($Str.text);}            -> /* omit statement from AST */
 | Include id ';'  {insertTokens(memory.get($id.text));} -> /* omit statement from AST */
 ;

constant
 : Constant id expr ';' {putInMemory($id.text, $expr.text);} -> ^(Constant id expr)
 ;

exprs
 : expr (',' expr)* -> ^(EXPRS expr+)
 ;

expr
 : add_expr
 ;

add_expr
 : mult_expr (('+' | '-')^ mult_expr)*
 ;

mult_expr
 : atom (('*' | '/')^ atom)*
 ;

atom
 : id
 | Num
 | Str
 | '(' expr ')' -> expr
 ;

id
 : Id
 | Include
 ;

Comment  : '!' ~('\r' | '\n')* {skip();};
Space    : (' ' | '\t' | '\r' | '\n')+ {skip();};
If       : 'if';
Include  : 'Include';
Constant : 'Constant';
Id       : ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | '0'..'9')+;
Str      : '"' ~'"'* '"';
Num      : '0'..'9'+ ('.' '0'..'9'+)?;

main.inf

Constant IMPORT "other.h";

[ Main;
  print "This is Main!";
  if (0) {    

  Include IMPORT;

  print "This is OtherFunction!";
];

other.h

  } ! end if
];  ! end Main

[ OtherFunction;

Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    // create lexer & parser
    Inform6Lexer lexer = new Inform6Lexer(new ANTLRFileStream("main.inf"));
    Inform6Parser parser = new Inform6Parser(new CommonTokenStream(lexer));

    // print the AST
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT((CommonTree)parser.parse().getTree());
    System.out.println(st);
  }
}

To run the demo, do the following on the command line:

java -cp antlr-3.3.jar org.antlr.Tool Inform6.g 
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

The output you'll see corresponds to the following AST:

enter image description here