0
votes

Using the following ANTLR grammar: https://github.com/bkiers/python3-parser/blob/master/src/main/antlr4/nl/bigo/pythonparser/Python3.g4 I want to parse from a given expression, lets say:

x.split(y, 3)

or

x + y

The variables x and y. How would I achieve this?

I tried the following approach but it seems cumbersome since I must add all build-in python functions:

Define a Listener interface

const listener = new MyPythonListener()
antlr.tree.ParseTreeWalker.DEFAULT.walk(listener, abstractTree)

Use regex + pattern matching:

const symbolicNames = ['TRUE', 'FALSE', 'NUMEBRS', 'STRING', 'LIST', 'TUPLE', 'DICTIONARY', 'INT', 'LONG', 'FLOAT', 'COMPLEX',
'BOOL', 'STR', 'INT', 'RANGE', 'NONE', 'LEN']

class MyPythonListener extends Python3Listener {
    variables = []

    enterExpr(ctx) {
        const text = this.getElementText(ctx)
        if (text && this.verifyIsVariable(text)) {
            this.variables.push(text)
        }
    }

    verifyIsVariable(leafText) {
        return !leafText.includes('"') && !leafText.includes('\'') && isNaN(leafText) &&
            !symbolicNames.includes(leafText.toUpperCase()) && leafText.match(/^[0-9a-zA-Z_]+$/)
    }
}
1
You can't use that grammar to extract variables. You can create an ANTLR grammar based on the grammar/specification you linked to and then use that ANTLR grammar to extract variables. The ANTLR grammar is most likely not a 1-to-1 translation of the specification, so there is no answer to your question without seeing the ANTLR grammar. So, could you post your ANTLR grammar?Bart Kiers
Btw, it might be easier to use Python's own parser/ast package to retrieve such things from Python code: docs.python.org/3/library/ast.htmlBart Kiers
Thanks for responding, this is the ANTLR grammar I am using: github.com/bkiers/python3-parser - thanks for sharing this as open sourceuser3642381
You're welcome. In the README in that repository, I link to a class that gives an example how to extract things from the parse tree. Could you edit your own question and add what you have tried yourself?Bart Kiers
@BartKiers I edited the question and added one approach I tried using a listener + pattern matching. I also tried another variant by generating a simplified tree and getting the leaves but it doesn't look promising, any suggestion on how would you tackle such an issue is welcomed since I don't have enough experience to start something promising and I can't find appropriate guidance anywhere. Thank you!user3642381

1 Answers

1
votes

I didn't look too closely at it, but after inspecting the parse tree for the Python code:

def some_method_name(some_param_name):
    x.split(y, 3)

it appears that the variable names are children of the atom rule:

atom
 : '(' ( yield_expr | testlist_comp )? ')' 
 | '[' testlist_comp? ']'  
 | '{' dictorsetmaker? '}' 
 | NAME 
 | number 
 | str+ 
 | '...' 
 | NONE
 | TRUE
 | FALSE
 ;

where NAME is a variable name.

So you could do something like this:

String source = "def some_method_name(some_param_name):\n    x.split(y, 3)\n";
Python3Lexer lexer = new Python3Lexer(CharStreams.fromString(source));
Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));

ParseTreeWalker.DEFAULT.walk(new Python3BaseListener() {
    @Override
    public void enterAtom(Python3Parser.AtomContext ctx) {
        if (ctx.NAME() != null) {
            System.out.println(ctx.NAME().getText());
        }
    }
}, parser.file_input());

which will print:

x
y

and not the method and parameter names.

Again: not thoroughly tested, I leave that for you. You can pretty print the parse tree like this:

String source = "def some_method_name(some_param_name):\n    x.split(y, 3)\n";
Python3Lexer lexer = new Python3Lexer(CharStreams.fromString(source));
Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));

System.out.println(new Builder.Tree(source).toStringASCII());

to inspect for yourself where the nodes you're intereseted in occur in the parse tree. The code above will print:

'- file_input
   |- stmt
   |  '- compound_stmt
   |     '- funcdef
   |        |- def
   |        |- some_method_name
   |        |- parameters
   |        |  |- (
   |        |  |- typedargslist
   |        |  |  '- tfpdef
   |        |  |     '- some_param
   |        |  '- )
   |        |- :
   |        '- suite
   |           |- <NEWLINE>
   |           |- <INDENT>
   |           |- stmt
   |           |  '- simple_stmt
   |           |     |- small_stmt
   |           |     |  '- expr_stmt
   |           |     |     '- testlist_star_expr
   |           |     |        '- test
   |           |     |           '- or_test
   |           |     |              '- and_test
   |           |     |                 '- not_test
   |           |     |                    '- comparison
   |           |     |                       '- star_expr
   |           |     |                          '- expr
   |           |     |                             '- xor_expr
   |           |     |                                '- and_expr
   |           |     |                                   '- shift_expr
   |           |     |                                      '- arith_expr
   |           |     |                                         '- term
   |           |     |                                            '- factor
   |           |     |                                               '- power
   |           |     |                                                  |- atom
   |           |     |                                                  |  '- x
   |           |     |                                                  |- trailer
   |           |     |                                                  |  |- .
   |           |     |                                                  |  '- split
   |           |     |                                                  '- trailer
   |           |     |                                                     |- (
   |           |     |                                                     |- arglist
   |           |     |                                                     |  |- argument
   |           |     |                                                     |  |  '- test
   |           |     |                                                     |  |     '- or_test
   |           |     |                                                     |  |        '- and_test
   |           |     |                                                     |  |           '- not_test
   |           |     |                                                     |  |              '- comparison
   |           |     |                                                     |  |                 '- star_expr
   |           |     |                                                     |  |                    '- expr
   |           |     |                                                     |  |                       '- xor_expr
   |           |     |                                                     |  |                          '- and_expr
   |           |     |                                                     |  |                             '- shift_expr
   |           |     |                                                     |  |                                '- arith_expr
   |           |     |                                                     |  |                                   '- term
   |           |     |                                                     |  |                                      '- factor
   |           |     |                                                     |  |                                         '- power
   |           |     |                                                     |  |                                            '- atom
   |           |     |                                                     |  |                                               '- y
   |           |     |                                                     |  |- ,
   |           |     |                                                     |  '- argument
   |           |     |                                                     |     '- test
   |           |     |                                                     |        '- or_test
   |           |     |                                                     |           '- and_test
   |           |     |                                                     |              '- not_test
   |           |     |                                                     |                 '- comparison
   |           |     |                                                     |                    '- star_expr
   |           |     |                                                     |                       '- expr
   |           |     |                                                     |                          '- xor_expr
   |           |     |                                                     |                             '- and_expr
   |           |     |                                                     |                                '- shift_expr
   |           |     |                                                     |                                   '- arith_expr
   |           |     |                                                     |                                      '- term
   |           |     |                                                     |                                         '- factor
   |           |     |                                                     |                                            '- power
   |           |     |                                                     |                                               '- atom
   |           |     |                                                     |                                                  '- number
   |           |     |                                                     |                                                     '- integer
   |           |     |                                                     |                                                        '- 3
   |           |     |                                                     '- )
   |           |     '- <NEWLINE>
   |           '- <DEDENT>
   '- <EOF>

Note that the Builder.Tree class is not part of the ANTLR library, it resides in the/my repo you linked to in your question: https://github.com/bkiers/python3-parser/blob/master/src/main/java/nl/bigo/pythonparser/Builder.java