TypeScript Compiler API loses formatting during transformation

Question

I have to modify around 1 000 of typescript files in a specific way: I need to replace all StringLiteral and JsxText tokens with a CallExpression to a translation function for internationalization purposes of my application.

I've already done such a task with our C# codebase with Roslyn, so now I'm trying to accomplish similar task with typescript compiler API. It's quite similar to the Roslyn API, but they have a nasty difference. In Roslyn you have a notion of Trivia tokens: such that doesn't emit to anything interesting, but are essential for readability purposes. Those are whitespaces, tabs, comments, etc. In a Roslyn syntax tree you have all the trivia from your source file. When you change your C# syntax tree in a some way and emit source code back from that syntax tree, you have all the same formatting, comments, whitespaces and all that stuff.

Unfortunatedly there aren't any trivia tokens in a typescript AST, so when I use code like this all my formatting goes away.

const result: ts.TransformationResult<ts.SourceFile> = ts.transform(
  sourceFile, [ transformerFactory(visitorFunction) ]
);

const transformedSourceFile: ts.SourceFile = result.transformed[0];

const printer: ts.Printer = ts.createPrinter();

const generated: string = printer.printNode( ts.EmitHint.SourceFile, transformedSourceFile, sourceFile);

What options do I have?

I can stick with a described above approach, but it will lead to lots of useless editing, spoiled github history and gigantic pull request. Whith this approach I should definitedly use Prettier after my transformations and probably I should install it as a developer dependancy and in our CI so we don't have such problems in future.
I can still use AST for detection of my tokens, but I can make transformation without ts.Printer and ts.Transformation. I can get all literals to process during the detection phase, order them by their position in file descending and replace them using substring or something like this. This is quite tricky thing and I don't really want to do that, but I'm not happy with downsides of the first option.

So what should I do? Do I have some other options?

TypeScript actually retains the notion of trivia, I guess it is just not as clearly exposed in the API. You may want to look at: basarat.gitbook.io/typescript/overview/ast/…. Here is an example where you can see that AST retains the trivia but you have to manually retrieve it: ts-ast-viewer.com/#code/…. Taking a look at using ts.createFormattingScanner() may also be useful. — Josh Bowden

Ira Baxter Ira Baxter · Accepted Answer · 2019-01-21T04:16:20

You could use tools that capture the formatting and comments, and regenerate them on completion of the transformation process, as you note that Roslyn does. However, Roslyn and the TypeScript "compiler" are specific to their target languages.

In general what you want is a "program transformation system". These are tools that accept grammars, automatically build ASTs that capture all that formatting data, allow you to define transformations using source-level patterns, and execute these transformations by matching/patching the ASTs, and them prettyprint the modified tree preserving that formatting data.

Our DMS Software Reengineering Toolkit can do that.

One has to define the target language grammar to it; we've done for many languages including JavaScript but not yet for TypeScript. However, you can build language dialects by building on top of other definitions. Or, you can do TypeScript from scratch; it isnt hard if you have an explicit grammar, which I think exists for TypeScript. Part of that definition tells the parse how to recognize comments so they can be saved; DMS knows how to save all the formatting and layout data.

With that, to solve your particular task, you could write actually very simple transformations using DMS rewrite rules:

source domain ECMAScript~TypeScript; -- assuming TypeScript is built as a dialect
target domain ECMAScript~TypeScript; -- we're defining rules that map TypeScript to itself
    -- you could write rules map TypeScript to C++ if you insist

rule InternationalizeStringLiteral(s:STRINGLITERAL): primary-> primary
  = "\s"-> "Translate(\s)";

rule InternationalizeJsText(jst:JSTText): primary -> primary
  = " \jst " -> "Translate(\jst)";

ruleset Internationalize = { InternationalizeStringLiteral, InternationalizeJsText};

You can ask DMS to parse a file, apply the ruleset bottom up to your tree, then prettyprint the result.

These rules are completely syntax aware, because they operate on the ASTs, so they are not fooled by text in comments or string literals, or line boundaries/whitespace/formats/interwoven comments, ...

Now, you have a 1000 files to change. This is big enough so it might be worth the effort to define TypeScript and apply DMS. (It would be a slam dunk if the TypeScript front end for DMS was ready, do the above). Sometimes it is not; YMMV depending on what you really want to do. DMS is best used on large code bases, and really shines if you have complex transformations to make.

TypeScript Compiler API loses formatting during transformation

1 Answers