1
votes

I understand that the python compiler compiles python code into code objects, which contain the bytecode and a few other fields such as immediate values in the code block, variables, stack_sz etc. The code objects are then mapped to c functions/precompiled assemblies by the interpreter.

I am having troubles connecting this python behavior with what I learned in my compiler class. In general, compilers creates some intermediate representation of the source code and store it in a data structure such as an abstract syntax tree.

My questions is what is the relationship between python code object and AST? If you do code generation on an AST, is code object what you get? Is it possible to view the AST and manually generate code objects using python commands?

2
Like most of compilers, Python uses AST on the syntax analysis phase. You can produce them yourself by using the ast module. - bereal
Python does let you manipulate AST objects. Pytest, for example, does this to rewrite asserts. - gilch
The AST is the input to the step that produces a code object. The built-in compile function lets you pass source code as text (in which case it will first produce an AST) or an AST as input, and return a code object. - chepner
@gilch That's interesting! I never thought of that. Can you elaborate a bit on how AST is used by Pytest? - user1559897

2 Answers

3
votes

My questions is what is the relationship between python code object and AST?

AST is an intermediate stage in the process of compiling Python code. You can make Python bytecode from AST, and make Python bytecode from Python code by way of AST.

>>> import ast
>>> my_ast = ast.parse("print('hi')", "<str>", "exec")
>>> my_ast
<_ast.Module object at 0x000001616E96B9A0>

If you do code generation on an AST, is code object what you get?

>>> compile(my_ast, "<str>", "exec")
<code object <module> at 0x000001616EA262F0, file "<str>", line 1>

Is it possible to view the AST and manually generate code objects using python commands?

>>> ast.dump(my_ast)
"Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Constant(value='hi', kind=None)], keywords=[]))], type_ignores=[])"

See the Green Tree Snakes tutorial for how to manipulate ast.

The standard library AST tools improved a lot in Python version 3.9. It can now pretty-print dumps and unparse ast back into a code string.

1
votes

The AST is the intermediate step between source code and a code object. The built-in compile function can produce a code object from either raw source code:

>>> import ast, dis
>>> program = "3 + x"
>>> dis.dis(compile(program, "", "eval"))
  1           0 LOAD_CONST               0 (3)
              2 LOAD_NAME                0 (x)
              4 BINARY_ADD
              6 RETURN_VALUE

(in which case it just parses the source code into an AST first) or directly from an existing AST:

>>> program_ast = ast.dump(ast.parse(program, "", "eval"))
>>> program_ast
"Expression(body=BinOp(left=Constant(value=3, kind=None), op=Add(), right=Name(id='x', ctx=Load())))"
>>> dis.dis(compile(program_ast, "", "eval"))
  1           0 LOAD_CONST               0 (3)
              2 LOAD_NAME                0 (x)
              4 BINARY_ADD
              6 RETURN_VALUE

In both cases, dis.dis shows you that the end result is the same, regardless of which you pass to compile.


(In fact, ast.parse is the same as passing the PyCF_ONLY_AST flag to compile, so that you get the parse tree instead of the final byte code.)

The AST is where compiler optimizations can be made. For example, given a fragment of an AST like

>>> ast.dump(ast.parse("3 + 5", "", "eval"))
'Expression(body=BinOp(left=Constant(value=3, kind=None), op=Add(), right=Constant(value=5, kind=None)))'

the compiler can recognize that it has enough information to replace this fragment with

'Expression(body=Constant(value=8, kind=None))'

without changing the semantics of the final program.