4
votes

I'm presently gluing together ANTLR and Clojure, trying to create a Clojure zipper over the abstract syntax tree that ANTLR returns.

The AST is a very Java-flavored set of objects, using CommonTree objects to represent the hierarchy.

I made a zipper over the CommonTree as follows:

(defn branch? [tn] (not (zero? (.getChildCount tn))))
(defn children [tn] (.getChildren tn))
(defn make [tn children] (doto (CommonTree. tn)
                           (.addChildren children)))

(defn zip-parse [f] (z/zipper branch? children make (parse f)))

(I'm not 100% sure that making CommonTree nodes that way will work. I haven't gotten far enough to verify it yet...)

I use these functions like this:

(def zip-ast (parse testfile))

So far, so good. This actually works. I can navigate with the "down", "right", "left", and "up" functions. The problem arises when I try to use the zip-filter library to locate particular tokens:

(defn token [loc] (-> loc z/node .getToken .getText))

(defn token= [tokenname]
  (fn [loc]
    (filter #(and (z/branch? %) (= tokenname (token %)))
            (if (zf/auto? loc)
              (zf/children-auto loc)
              (list (zf/auto true loc))))))

(defn java->
  [loc & preds]
  (zf/mapcat-chain loc preds #(cond (string? %) (token= %))))

This is blatantly copied from Chouser's nice xml-> function. Unfortunately, it just doesn't work. Inside zip-filter, the function "auto" adds or removes metadata from the object. Except, plain old Java objects can't have metadata.

Am I barking up the wrong tree? Or (more likely), do I not understand zip-filter well enough to copy it?

1

1 Answers

2
votes

Zippers store the branch?, children, and make functions as meta on the loc wrapping the node and it seems that auto is adding or removing meta to the loc wrapper (a vector) around the object, not the object itself. So, I think that's not a problem.

Can you explain more about what "just doesn't work"?

One place that looks fishy (and that has tripped me up with custom zippers) is the branch? function. Note that your first and condition in token= checks branch? - if the token has no children, it won't match in token=, which might be surprising. branch? really says whether it's possible for a node to have children, so sometimes you want to return true there even if it doesn't actually have children. Other than this, I don't see anything obvious.

Note: In java->, since you only have one option, you can simplify the anonymous function. If it's always a string, then you can just replace the whole anonymous function with just token=. Or if you need to handle the non-string case and return nil, you might want (when (string? %) (token= %)).

I've actually attacked almost this same problem in the past along a different route. Not sure this helps at all, but just in case...

I built an Antlr grammar that produced output that I wanted to traverse and modify in Clojure as a tree. The solution I ultimately hit on was:

  • Antlr grammar ->
  • Antlr tree grammar (massaging, language agnostic) ->
  • Antlr string templates (Clojure-specific) to generate ->
  • Clojure data structures as strings ->
  • read Clojure data structures into a nested tree (records for us, but could be whatever)

One benefit of this was that the Antlr Java code did not depend on the Clojure code directly (only in the format of the strings passed in) which made the Clojure code depend only on the generated Java code. This simplified the dependency structure a bit when compiling the project (I think you've achieved this as well). Another benefit is that the grammar and tree grammar were language agnostic so you could create Clojure and/or Java and/or other targets from the same grammar (not that I'm actually doing that).