variation in code generation, using ANTLR4 parse tree visitors

Question

I am writing transpiler (myLang -> JS) using ANTLR (javascript target with visitor).
Focus is on target code generation part, from the parse tree.
As in, how to deal with language source codes variations.

To make question clearer, consider two variations below -

source#1:
PRINT 'hello there'

source#2:

varGreeting = 'hey!'

PRINT varGreeting

In case 1, I deal with string. While in case 2, it's a variable. JS target code then needs to be different (below). case 1 with quotes, case 2 without.

target#1 (JS):

console.log("hello there");   // <-- string

target#2 (JS):

var varGreeting = "hey!";
console.log(varGreeting);  // <-- var

How can I best disambiguate and generate different code? At once, I thought of using rule name (ID, STRLIT) as bearer of different usages.
But I couldn't find these being exposed in RuleContext API. I looked at java ones, assuming same in JS runtime.

getText() gives value ('hello there', varGreeting), no meta/attribute info that I can leverage.

I digged into the tree/ctx object and didn't find them in easily consumable way.

Question: how to best go about this, without building ugly hacks? Transpiler seems to be in within use case spot of ANTLR, do I missing something?

(relevant part of) Grammar:

print : PRINTKW (ID | STRLIT) NEWLINE;

STRLIT: '\'' .*? '\'' ;
ID    : [a-zA-Z0-9_]+;

Visitor override:

// sample code for generating code for case 1 (with quotes) 
myVisitor.prototype.visitPrint = function(ctx) {


    const Js = 
    `console.log("${ctx.getChild(1).getText()}");`;

    // ^^ this is the part which needs different treatment for case 1 and 2 

    // write to file
    fs.writeFile(targetFs + fileName + '.js', Js, 'utf8', function (err) {
        if (err) return console.log(err);
        console.log(`done`);
      });

  return this.visitChildren(ctx);
};

using ANTLR 4.8

I'm unclear on what your code looks like and how/from where you're accessing the argument to print. If you could post the current code of your listener or visitor (maybe with a comment like "// here's where I want to know whether it's an ID or STRLIT"), that would probably make it clearer for me. — sepp2k

sepp2k sepp2k · Accepted Answer · 2020-03-21T10:18:47

You're using getChild(1) to access the argument of the print statement. This will give you a TerminalNode containing either an ID or STRLIT token. You can access the token using the getSymbol() method and you can then access the token's type using the .type property. The type will be a number that you can compare against constants like MyLanguageParser.ID or MyLanaguageParser.STRLIT.

Using getChild isn't necessarily the best way to access a node's children though. Each context class will have specific accessors for each of its children.

Specifically the PrintContext object will have methods ID() and STRLIT(). One of them will return null, the other will return a TerminalNode object containing the given token. So you know whether it was an ID or string literal by seeing which one isn't null.

That said, the more common solution would be to not have a union of possible kinds of arguments in the print rule, but instead allow any kind of expression as an argument to print. You can then use labelled alternatives in your expression rule to get different visitor methods for each kind of expression:

print : PRINTKW expression NEWLINE;

expression
    : STRLIT #StringLiteral
    | ID #Variable
    ;

Then your visitor could look like this:

myVisitor.prototype.visitPrint = function(ctx) {
    const arg = this.visit(ctx.expression());
    const Js = `console.log(${arg});`;

    // write to file
    fs.writeFile(targetFs + fileName + '.js', Js, 'utf8', function (err) {
        if (err) return console.log(err);
        console.log(`done`);
    });
};

myVisitor.prototype.visitStringLiteral = function(ctx) {
    const text = ctx.getText();
    return `"${text.substring(1, text.length - 1)}"`;
}

myVisitor.prototype.visitVariable = function(ctx) {
    return ctx.getText();
}

Alternatively you could leave out the labels and instead define a visitExpression method that handles both cases by seeing which getter returns null:

myVisitor.prototype.visitExpression = function(ctx) {
    if (ctx.STRLIT !== null) {
        const text = ctx.getText();
        return `"${text.substring(1, text.length - 1)}"`;
    } else {
        return ctx.getText();
    }
}

PS: Do note that single quotes work just fine in JavaScript, so you don't actually need to strip the single quotes and replace them with double quotes. You could just use .getText() without any post-processing in both cases and that'd still come out as valid JavaScript.

variation in code generation, using ANTLR4 parse tree visitors

1 Answers