antlr function body indention

Question

Using this https://github.com/antlr/grammars-v4/tree/master/cpp antlr grammar Im trying to parse C++ code. I want to get the code for each function so I decided to do visit visitFunctionBody and you can see the code below,

#include <iostream>
#include <antlr4-runtime.h>

#include "parser/CPP14Lexer.h"
#include "parser/CPP14BaseVisitor.h"
#include "parser/CPP14Parser.h"
#include "parser/CPP14Visitor.h"


class TREEVisitor : public CPP14BaseVisitor {
    public:
        virtual antlrcpp::Any TREEVisitor::visitFunctionBody(
            CPP14Parser::FunctionBodyContext *ctx) override
        {
            std::cout << ctx->getText() << std::endl;
            return visitChildren(ctx);
        }
};


int main(int argc, char *argv[]) {

    std::ifstream stream;
    stream.open(argv[1]);
    antlr4::ANTLRInputStream input(stream);
    CPP14Lexer lexer(&input);
    antlr4::CommonTokenStream tokens(&lexer);
    CPP14Parser parser(&tokens);
    antlr4::tree::ParseTree *tree = parser.translationunit();

    // Visitor
    auto *visitor = new TREEVisitor();
    visitor->visit(tree);

    return 0;
}

and I tried to parse this very basic c++ code,

void foo()
{
    char buf[10];
    int i = 10;
    int b = i * 2;
    return b * i;
}

The output of my antlr visitor function is the code of the sample function but without any newline and indention like below,

{charbuf[10];inti=10;intb=i*2;returnb*i;}

How can I get the source code of the function that Im parsing as it is in the source file?

In my use case I parse a big C++ file and I want to match the result of my parsing with the actual source code.

Thanks

kaby76 kaby76 · Accepted Answer · 2021-02-26T19:35:14

Here is one way, but there are others. In CPP14Lexer.g4, change "-> skip" to "-> channel(HIDDEN)". Then, in visitFunctionBody(), change the call "getText()" to "myGetText(ctx)", and define the routine myGetText() *like this, but for C++". This code is in Java.

public String myGetText(ParseTree node) {
    if (node.getChildCount() == 0) {
        Token t = ((TerminalNodeImpl)node).getSymbol();
        List<Token> tokensBefore = tokens.getHiddenTokensToLeft(t.getTokenIndex(), Token.HIDDEN_CHANNEL);
        String pre = "";
        if (tokensBefore != null) {
            StringBuilder builder2 = new StringBuilder();
            for (Token token : tokensBefore) {
                CharStream input = token.getInputStream();
                String s = input.getText(Interval.of(token.getStartIndex(),token.getStopIndex()));
                builder2.append(s);
            }
            pre = builder2.toString();
        }
        String s2 = node.getText();
        String ss = pre + s2;
        return ss;
    }
    StringBuilder builder = new StringBuilder();
    for (int i = 0; i < node.getChildCount(); i++) {
        String s = myGetText(node.getChild(i));
        builder.append(s);
    }
    return builder.toString();
}

You can also reconstruct the text without changing the "skip" to "HIDDEN" by directly querying the char stream the characters involved between leaf nodes in the tree.

static void Reconstruct(ParseTree node, Parser parser)
{
    var ct = (ParserRuleContext)node;
    Token ta = ct.getStart();
    Token tb = ct.getStop();
    var input_stream = ta.getInputStream();
    var start = ta.getStartIndex();
    var stop = tb.getStopIndex();
    System.out.println(input_stream.getText(new Interval(start, stop)));
}

antlr function body indention

2 Answers