I need to write a Java program, using ANTLR4
that, given a source file with a single method, can count the number of variables, operators, punctuation symbols and reserved words.
How can I use ANTLR4
to count tokens based on their type?
After doing some research, and based on Özhan Düz, I realized that what I needed requires two techniques:
For the sake of all people who will need to do something similar in the future, here's exactly how I did it:
1) Use the ANTLR command line tool to generate a Lexer, Parser and BaseListener for your language. Instructions for how to do it can be found on the ANTLR official website. In this example I created these classes for analyzing the Java language.
2) Create a new Java project. Add the JavaLexer.java
, JavaListener.java
, JavaParser.java
and JavaBaseListener.java
to your project, and add the ANTLR library to your project's build path.
3) Create a new class extending the JavaBaseListener
base class. Take a look at the JavaBaseListener.java
file for all the methods you can override. When scanning the source code's AST, each method will be invoked at the time the corresponding event has occurred (for example - enterMethodDeclaration()
will be invoked each time the parser has reached a new method declaration).
For example, this listener will raise a counter by 1 each time it has found a new method:
public static final AtomicInteger count = new AtomicInteger();
/**
* Implementation of the abstract base listener
*/
public static class MyListener extends JavaBaseListener {
/**
* Overrides the default callback called whenever the walker has entered a method declaration.
* This raises the count every time a new method is found
*/
@Override
public void enterMethodDeclaration(JavaParser.MethodDeclarationContext ctx) {
count.incrementAndGet();
}
}
4) Create a Lexer, a Parser, a ParseTree and a ParseTreeWalker:
JavaLexer.java
)Then, finally, instantiate your listener and walk the ParseTree.
For example:
public static void main(String... args) throws IOException {
JavaLexer lexer = new JavaLexer(new ANTLRFileStream(sourceFile, "UTF-8"));
JavaParser parser = new JavaParser(new CommonTokenStream(lexer));
ParseTree tree = parser.compilationUnit();
ParseTreeWalker walker = new ParseTreeWalker();
MyListener listener = new MyListener();
walker.walk(listener, tree);
}
This is the basis. The next steps depend on what you want to achieve, and this brings me back to the difference between using a Lexer and a Parser:
For basic lexical analysis of your code, like identifying operators and reserved words, use the lexer to iterate over your tokens and determine their type by checking the Token.type field. Use this code to count the number of reserved words inside a method:
private List<Token> tokenizeMethod(String method) {
JavaLexer lex = new JavaLexer(new ANTLRInputStream(method));
CommonTokenStream tokStream = new CommonTokenStream(lex);
tokStream.fill();
return tokStream.getTokens();
}
/**
* Returns the number of reserved words inside the given method, using lexical analysis
* @param method The method text
*/
private int countReservedWords(String method) {
int count = 0;
for(Token t : tokenizeMethod(method)) {
if(t.getType() <= JavaLexer.WHILE) {
count++;
}
}
return count;
}
For tasks that require parsing the AST, like identifying variables, methods, annotations and more, use a Parser. Use this code to count the number of variable declarations inside a method:
/**
* Returns the number of variable declarations inside the given method, by parsing the method's AST
* @param method The method text
*/
private int countVariableDeclarations(String method) {
JavaLexer lex = new JavaLexer(new ANTLRInputStream(method));
JavaParser parse = new JavaParser(new CommonTokenStream(lex));
ParseTree tree = parse.methodDeclaration();
ParseTreeWalker walker = new ParseTreeWalker();
final AtomicInteger count = new AtomicInteger();
walker.walk(new JavaBaseListener() {
@Override public void enterLocalVariableDeclaration(JavaParser.LocalVariableDeclarationContext ctx) {
count.incrementAndGet();
}
}, tree);
return count.get();
}
You can use hashmap like this to keep the track of all the words types
@header {
import java.util.HashMap;
}
@members {
// Map variable name to Integer object holding value
HashMap memory = new HashMap();
}
Identifier
: IdentifierNondigit( IdentifierNondigit | Digit )* {
if(memory.containsKey(getText())){
memory.put(getText(),(((Integer)memory.get(getText()))+1));
}
else {
memory.put(getText(),1);
}
System.out.println(getText()+" : "+memory.get(getText()));
}
// { getText().length()<=3}?{ String str=getText(); while(str.length()<=3){ str=str+str;} setText(str);}
| IdentifierNondigit ( IdentifierNondigit | Digit)*
;
Like this , in stead of getToken(), you can directly say "reserved" key and store the count after every increment