Comments can be preserved by merging them back into the generated source code by capturing them with the tokenizer.
Given a toy program in a program variable, we can demonstrate how comments get lost in the AST:
import ast
program = """
# This comment lost
p1v = 4 + 4
p1l = ['a', # Implicit line joining comment for a lost
'b'] # Ending comment for b lost
def p1f(x):
"p1f docstring"
# Comment in function p1f lost
return x
print(p1f(p1l), p1f(p1v))
"""
tree = ast.parse(program)
print('== Full program code:')
print(ast.unparse(tree))
The output shows all comments gone:
== Full program code:
p1v = 4 + 4
p1l = ['a', 'b']
def p1f(x):
"""p1f docstring"""
return x
print(p1f(p1l), p1f(p1v))
However, if we scan the comments with the tokenizer, we can
use this to merge the comments back in:
from io import StringIO
import tokenize
def scan_comments(source):
""" Scan source code file for relevant comments
"""
# Find token for comments
for k,v in tokenize.tok_name.items():
if v == 'COMMENT':
comment = k
break
comtokens = []
with StringIO(source) as f:
tokens = tokenize.generate_tokens(f.readline)
for token in tokens:
if token.type != comment:
continue
comtokens += [token]
return comtokens
comtokens = scan_comments(program)
print('== Comment after p1l[0]\n\t', comtokens[1])
Output (edited to split long line):
== Comment after p1l[0]
TokenInfo(type=60 (COMMENT),
string='# Implicit line joining comment for a lost',
start=(4, 12), end=(4, 54),
line="p1l = ['a', # Implicit line joining comment for a lost\n")
Using a slightly modified version of ast.unparse()
, replacing
methods maybe_newline()
and traverse()
with modified versions,
you should be able to merge back in all comments at their
approximate locations, using the location info from the comment
scanner (start variable), combined with the location info from the
AST; most nodes have a lineno
attribute.
Not exactly. See for example the list variable assignment. The
source code is split out over two lines, but ast.unparse()
generates only one line (see output in the second code segment).
Also, you need to ensure to update the location info in the AST
using ast.increment_lineno()
after adding code.
It seems some more calls to
maybe_newline()
might be needed in the library code (or its
replacement).
inspect.getsource()
returns the source code of an object including comments. Is this what you need? – rchomeast.Assign
;ast.AnnAssign
; andast.FunctionDef
/ast.AsyncFunctionDef
. Inferring types, adding them as a type comment xor annotation; converting between docstring formats (incl. adding/remove types); and updating thereturn
attribute of a function definition. – Samuel Marks