0
votes

I am using the Python3 grammar from below location,

https://github.com/antlr/grammars-v4/blob/master/python3/Python3.g4

I have the below code to to parse,

ANTLRInputStream input = new ANTLRInputStream(new FileInputStream("Functions.py"));
Python3Lexer lexer = new Python3Lexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
Python3Parser parser = new Python3Parser(tokens);

ParseTree tree = parser.funcdef(); //Not sure what to do here
ParseTreeWalker walker = new ParseTreeWalker();

walker.walk(new Listener(), tree);

Listener.java

public class Listener extends Python3BaseListener{
    @Override
    public void enterImport_name(Python3Parser.Import_nameContext ctx) { 
        System.out.println(ctx.getText());
    }

    @Override 
    public void enterFuncdef(Python3Parser.FuncdefContext ctx) { 
    System.out.println(ctx.getText()); //returns the whole code as string
    }   
}

I am trying to read all the imports, Variables and method names along with arguments from the python file.

How can i do this?

1

1 Answers

1
votes

This is not a trivial problem. As a general way to write listeners, I would recommend you get code to print out the parse tree, add that to your program, and try a few different source files. Then, you can decide how to write the listeners and for what nodes.

For example, https://github.com/antlr/grammars-v4/blob/master/python3/examples/base_events.py, the first import sub-tree looks like this:

  ( stmt
    ( simple_stmt
      ( small_stmt
        ( import_stmt
          ( import_name
            ( TOKEN i=2 t=import
            ) 
            ( dotted_as_names
              ( dotted_as_name
                ( dotted_name
                  ( HIDDEN text=\ 
                  ) 
                  ( TOKEN i=4 t=collections
      ) ) ) ) ) ) ) 
      ( TOKEN i=5 t=\r\n
  ) ) ) 

You will need to look at the grammar and verify that your examples really cover the grammar. For base_events.py, import_from is not exercised (https://www.geeksforgeeks.org/import-module-python/), so you'll have to write some examples that use that syntax. Given what you said, and what I see, I'd create a listener for the dotted_as_name context, verifying that its parent is an import_stmt, then just get the first child's text. enterImport_name() is a good choice if you don't care about "import", "as", and commas also appearing in the string returned from getText().

But, I think you have the picture.