4
votes

I have an ANTLR grammar which creates an AST, and then I have written two tree grammars that create tree parsers for doing two passes on the AST for semantic analysis purposes. (After that I do another pass and generate output code with StringTemplate)

Everything works fine so far, but I am trying to extend the language to support parametric polymorphism in functions (while so far it only supported "simple" functions).

So for example I want to have something like this:

T getMax<T>  (T a, T b) {
     if (a > b) return a;
   return b;
}

and generate simple, non parametrically polymorphic code, according to the actual types with which the function is called. For example if someone calls getMax<int> (5) then I will only generate the code for int getMax(int a, int b)

So far, in the 1st pass, I check all the calls to the polymorphic function, and save the specific types with which the function was called with.

So, at this point I know all the parametric types, and all the actual types that they need to be replaced with.

In the 2nd pass, I want to actually modify my syntax tree and replace this parametrically polymorphic function declaration, with 1 or more function declarations that have specific types.

So, the question is

  1. What is the best way to copy and create "sibling" nodes (with also all their children) in the AST and insert them next to the original function declaration? Do I just say something like

          {
           myTreeNode.parent.addChild(myTreeNode.dupNode());  
          }
    
  2. What is the best way to replace all occurrences of type T with, say, int in the newlycreated subtree in the above example? I think a simple rewrite rule is not enough, as I also need to replace all the types in the body of the function. Do I need to write another tree grammar that will work just on this function declaration subtree and do all the replacing? Is it easier to do it manually?

Sorry if this is too confusing.

Edit:

Let's say your input is this:

T add<T> (T a, T b) { return a+b }
add<int>(1, 2)
add<string>('f', 'oo')

how would your AST look like after the 2nd pass?

I was thinking of deleting the original function declaration, and introducing 2 specialised declarations at its place.

So the resulting AST would be a root node (we can call it "program") with 4 children, 2 function delcarations and 2 function calls.

Printed Lisp-style:

(METHOD_DECL int add (ARG_DECL int a) (ARG_DECL int b) (BLOCK (return (EXPR (+ (EXPR a) (EXPR b)))))) 
(METHOD_DECL string add (ARG_DECL string a) (ARG_DECL string b) (BLOCK (return (EXPR (+ (EXPR a) (EXPR b)))))) 
(EXPR (CALL add (ELIST (EXPR 3) (EXPR 4)))) 
(EXPR (CALL add (ELIST (EXPR "f") (EXPR "oo"))))

The subtree that got deleted and replaced was this:

(METHOD_DECL T add (TYPEPARAMS T) (ARG_DECL T a) (ARG_DECL T b) (BLOCK (return (EXPR (+ (EXPR a) (EXPR b))))))

Also, the reason I want to delete the original parametrically polymorphic function declaration subtree, is that I don't know how to ignore it when I do the type checking. The tree parser "automatically" matches all the binary operations, return statements, arguments etc and does the type checking. So, if I leave it there it reports errors because for example T is not a proper type, so you can't use it with the + operator.

1
Sorry, I'm not 100% sure what the resulting AST would look like. Let's say your input is this: T add<T> (T a, T b) { return a+b }; add<int>(1,2); add<string>('f','oo'); how would your AST look like after the 2nd pass?Bart Kiers
Good question, I added an edit to the question.Irene Papakonstantinou
Btw, yours is a good question as well (not confusing at all).Bart Kiers

1 Answers

7
votes

Personally, I'd let the parser separate the methods from main code block. Given the input:

T add<T> (T a, T b) { 
  T c = a + b 
  return c 
}
int sub(int x, int y) { return x-y }
add<int>(1, 2)
add<string>('f', 'oo')

then the parser would produce the tree that represents:

add<int>(1, 2)
add<string>('f', 'oo')

and a separate tree representing the methods:

int    add (int a, int b)       { int c = a + b      return c   }
string add (string a, string b) { string c = a + b   return c   }
int    sub (int x, int y)       {                    return x-y }

During the parsing phase, you simply keep track of all parametric parameter (pp henceforth) -calls and -methods in two instance variables:

@parser::members {

  private Map<String, Set<Token>> ppMethodCalls = new HashMap<String, Set<Token>>();
  private Map<String, CommonTree> ppMethods = new HashMap<String, CommonTree>();

  // a separate AST for all methods
  public CommonTree methods = new CommonTree(new CommonToken(METHODS, "METHODS"));

  // ...

}

After parsing the example code, ppMethodCalls would hold:

{"add" => {Token="int", Token="string"}}

and ppMethods would hold:

{"add" => Tree=^(METHOD T add ... )} 

When the parser has fully parsed the input source, the following method is called:

  public void replacePP() {

    // iterate over all pp method calls
    for(Map.Entry<String, Set<Token>> entry : ppMethodCalls.entrySet()) {

      // get the name of the method being called
      String name = entry.getKey();

      // iterate over all tokens ('int' and 'string', in my example)
      for(Token tok : entry.getValue()) {

        // get the original pp-method instance
        CommonTree ppm = ppMethods.get(name);

        // create a deep copy from the original pp-method (will post 'copyTree(...)' later)
        CommonTree cp = Main.copyTree(ppm);

        // remove the first token from the tree (this is 'T' in my example)
        CommonTree pp = (CommonTree)cp.deleteChild(0);

        // replace all 'T' tokens with the actual parameter (either 'int' or 'string' in my example)
        replace(pp, tok, cp);

        // add the rewritten tree to the separate 'methods' tree
        methods.addChild(cp);
      }
    }
  }

  private void replace(CommonTree param, Token token, CommonTree tree) {
    if(tree == null) return;
    if(tree.getType() == param.getType() && tree.getText().equals(param.getText())) {
      tree.getToken().setType(token.getType());
      tree.getToken().setText(token.getText());
    }
    if(tree.getChildCount() > 0) {
      for(Object child : tree.getChildren()) {
        replace(param, token, (CommonTree)child);
      }
    }
  }

which will create two new trees for both add<int>(...) and add<string>(...) and add them to the instance variable: methods: CommonTree.

A little demo:

file: PP.g

grammar PP;

options {
  output=AST;
  ASTLabelType=CommonTree;
}

tokens {
  ROOT;
  METHODS;
  BLOCK;
  ASSIGN;
  ARG;
  ARGS;
  ELIST;
  METHOD;
  CALL;
}

@parser::header {
  import java.util.Map;
  import java.util.HashMap;
  import java.util.Set;
  import java.util.HashSet;
}

@parser::members {

  private Map<String, Set<Token>> ppMethodCalls = new HashMap<String, Set<Token>>();
  private Map<String, CommonTree> ppMethods = new HashMap<String, CommonTree>();
  public CommonTree methods = new CommonTree(new CommonToken(METHODS, "METHODS"));

  private void ppCall(String name, Token pp) {
    Set<Token> existing = ppMethodCalls.remove(name);
    if(existing == null) existing = new HashSet<Token>();
    existing.add(pp);
    ppMethodCalls.put(name, existing);
  }

  public void replacePP() {
    for(Map.Entry<String, Set<Token>> entry : ppMethodCalls.entrySet()) {
      String name = entry.getKey();
      for(Token tok : entry.getValue()) {
        CommonTree ppm = ppMethods.get(name);
        CommonTree cp = Main.copyTree(ppm);
        CommonTree pp = (CommonTree)cp.deleteChild(0);
        replace(pp, tok, cp);
        methods.addChild(cp);
      }
    }
  }

  private void replace(CommonTree param, Token token, CommonTree tree) {
    if(tree == null) return;
    if(tree.getType() == param.getType() && tree.getText().equals(param.getText())) {
      tree.getToken().setType(token.getType());
      tree.getToken().setText(token.getText());
    }
    if(tree.getChildCount() > 0) {
      for(Object child : tree.getChildren()) {
        replace(param, token, (CommonTree)child);
      }
    }
  }
}

parse
  :  block EOF -> block
  ;

block
  :  statement* -> ^(BLOCK statement*)
  ;

statement
  :  ppMethod     {ppMethods.put($ppMethod.name, $ppMethod.tree);} -> /* omit from main AST */
  |  normalMethod {methods.addChild($normalMethod.tree);}          -> /* omit from main AST */
  |  methodCall
  |  assignment
  |  Return expr -> ^(Return expr)
  ;

assignment
  :  type Id '=' expr -> ^(ASSIGN type Id expr)
  |  Id Id '=' expr   -> ^(ASSIGN Id Id expr)
  ;

normalMethod
  :  type Id '(' argList ')' '{' block '}' -> ^(METHOD type Id argList block)
  ;

ppMethod returns [String name]
  :  tp=Id id=Id {$name=$id.text;} '<' pp=Id '>' '(' ppArgList ')' '{' block '}' -> ^(METHOD $pp $tp $id ppArgList block)
  ;

methodCall
  :  Id ('<' type '>' {ppCall($Id.text, $type.tree.getToken());})? '(' exprList ')' -> ^(CALL Id exprList)
  ;

argList
  :  (arg (',' arg)*)? -> ^(ARGS arg*)
  ;

arg
  :  type Id -> ^(ARG type Id)
  ;

ppArgList
  :  (ppArg (',' ppArg)*)? -> ^(ARGS ppArg*)
  ;

ppArg
  :  type Id -> ^(ARG type Id)
  |  Id Id   -> ^(ARG Id Id)
  ;

exprList
  :  (expr (',' expr)*)? -> ^(ELIST expr*)
  ;

expr
  :  atom (('+' | '-')^ atom)*
  ;

atom
  :  Int
  |  Str
  |  Id
  |  methodCall
  |  '(' expr ')' -> expr
  ;

type
  :  K_Int
  |  K_Str
  ;

Return : 'return';
K_Int  : 'int';
K_Str  : 'string';
Id     : ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | Digit)*;
Int    : Digit+;
Str    : '\'' ~'\''* '\'';

Comment : '#' ~('\r' | '\n')* {$channel=HIDDEN;};
Space   : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};

fragment Digit : '0'..'9';

file: Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {

  public static void main(String[] args) throws Exception {
    String src =
        "T add<T> (T a, T b) {                 \n" +
        "    T c = a + b                       \n" +
        "    return c                          \n" +
        "}                                     \n" +
        "int sub(int x, int y) { return x-y }  \n" +
        "add<int>(1, 2)                        \n" +
        "add<string>('f', 'oo')                \n";
    PPLexer lexer = new PPLexer(new ANTLRStringStream(src));
    PPParser parser = new PPParser(new CommonTokenStream(lexer));
    CommonTree tree = (CommonTree)parser.parse().getTree();
    parser.replacePP();
    System.out.println(new DOTTreeGenerator().toDOT(tree));
    System.out.println(new DOTTreeGenerator().toDOT(parser.methods));
  }

  // put this in a Util-class or else in the parser   
  public static CommonTree copyTree(CommonTree original) {
    CommonTree copy = new CommonTree(original.getToken());
    copyTreeRecursive(copy, original);
    return copy;
  }

  private static void copyTreeRecursive(CommonTree copy, CommonTree original) {
    if(original.getChildren() != null) {
      for(Object o : original.getChildren()) {

        CommonTree originalChild = (CommonTree)o;

        // get the token from the original child node
        CommonToken oTok = (CommonToken)originalChild.getToken();

        // create a new token with the same type & text as 'oTok' 
        CommonToken cTok = new CommonToken(oTok.getType(), oTok.getText());

        // copy all attributes from 'oTok' to 'cTok'  
        cTok.setLine(oTok.getLine());
        cTok.setCharPositionInLine(oTok.getCharPositionInLine());
        cTok.setChannel(oTok.getChannel());
        cTok.setStartIndex(oTok.getStartIndex());
        cTok.setStopIndex(oTok.getStopIndex());
        cTok.setTokenIndex(oTok.getTokenIndex());

        // create a new tree node with the 'cTok' as token
        CommonTree copyChild = new CommonTree(cTok);

        // set the parent node of the child node
        copyChild.setParent(copy);

        // add the child to the parent node
        copy.addChild(copyChild);

        // make a recursive call to copy deeper
        copyTreeRecursive(copyChild, originalChild);
      }
    }
  }
}

If you now run the main class:

*nix

java -cp antlr-3.3.jar org.antlr.Tool PP.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

or Windows

java -cp antlr-3.3.jar org.antlr.Tool PP.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main

you will see two trees being printed in DOT-code:

1

enter image description here

2

enter image description here

Note that this is just a quick hack I put together: things could be tidied up a bit more, and the returns [String name] in the ppMethod rule isn't pretty. It also does not cater for methods like this:

A foo<A,B> (A a, B b) {
  return A ~ B 
}

foo<int,string>(42, 'foo')

and perhaps many more things. I'll leave that for you, hoping you might get further with the little demo above.