1
votes

I am trying to write a manual tree walker in Java for an AST generated by ANTLR V3. The AST is built using island grammers as similar to the one specified in ANTLR: call a rule from a different grammar.

In the AST, I have a node for expression list with each expression as child node. Now I need to know the line numbers of the COMMAs which seperated the expressions. The COMMAs were present in parsing but removed during AST rewrite.

I see some resources(here and here) pointing to the usage of CommonTokenStream.getTokens but I am not sure how I can access the CommonTokenStream while processing the AST. Is there anyway I can get the CommonTokenStream used to build the AST?

1
1) "some resources" is a bit vague: can you point to actual resources (reference/link)? 2) If you're removing tokens from the AST during parsing, then these are (obviously) not available in your tree-walker. If you need info from a COMMA in your tree-walker, include it (or them).Bart Kiers
Thanks Bart. I have edited the post to include the links. Just wondering if there is a way to access all the tokens between the tree nodes as the nodes have the getTokenStartIndex() and getTokenStopIndex() without cluttering the AST with the COMMA tokens. Or do I need to extend the CommonTree to include the source tokenStream?Skar
The first links you're referring to is about accessing tokens that are discarded (or better: hidden) during the lexing phase. This is not what you're doing: the lexer does not hide your COMMA tokens (you're omitting them in a parser rule). However, the second link looks like it would do the trick for you.Bart Kiers
Yes Bart, but as I am writing a manual tree walker (in java, not ANTLR tree parser), It looks like I need to extend the CommonTree to hold the TokenStream object and set it in the parsing phase which can then be accessed within the walker.Skar

1 Answers

1
votes

The complete list of tokens is accessible through CommonTokenStream.getTokens(), which you can call before you call the tree walker. The list of tokens would be an argument to the walker. There's no need to change CommonTree, unless you want the recovered information embedded in the tree.

I've used the token list to associate hidden tokens such as comments and explicit line numbers (think FORTRAN) with the closest visible token. This was done post-processing the AST and looking at the line, column, and char-index information which is available for both the tokens in the list and the nodes in the AST.

My attempts at trying to that during AST construction resulted in hacky, unmaintainable code. The post-processing code, OTOH, is Programming-101 algorithmic.