2
votes

I'm trying to use the ANTLR3 C Target to make sense of an AST, but am running into some difficulties.

I have a simple SQL-like grammar file:

grammar sql;
options 
{
    language = C;
    output=AST;
    ASTLabelType=pANTLR3_BASE_TREE; 
}
sql :   VERB fields;
fields  :   FIELD (',' FIELD)*;
VERB    :   'SELECT' | 'UPDATE' | 'INSERT';
FIELD   :   CHAR+;
fragment
CHAR    :   'a'..'z';

and this works as expected within ANTLRWorks.

In my C code I have:

const char pInput[] = "SELECT one,two,three";
pANTLR3_INPUT_STREAM pNewStrm = antlr3NewAsciiStringInPlaceStream((pANTLR3_UINT8) pInput,sizeof(pInput),NULL);
psqlLexer lex =  sqlLexerNew         (pNewStrm);
pANTLR3_COMMON_TOKEN_STREAM   tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT,
    TOKENSOURCE(lex));
psqlParser ps = sqlParserNew( tstream );
sqlParser_sql_return ret = ps->sql(ps);
pANTLR3_BASE_TREE pTree = ret.tree;
cout << "Tree: " << pTree->toStringTree(pTree)->chars << endl;
ParseSubTree(0,pTree);

This outputs a flat tree structure when you use ->getChildCount and ->children->get to recurse through the tree.

void ParseSubTree(int level,pANTLR3_BASE_TREE pTree)
{
    ANTLR3_UINT32 childcount =  pTree->getChildCount(pTree);

    for (int i=0;i<childcount;i++)
    {
        pANTLR3_BASE_TREE pChild = (pANTLR3_BASE_TREE) pTree->children->get(pTree->children,i);
        for (int j=0;j<level;j++)
        {
            std::cout << " - ";
        }
        std::cout << 
            pChild->getText(pChild)->chars <<       
            std::endl;
        int f=pChild->getChildCount(pChild);
        if (f>0)
        {
            ParseSubTree(level+1,pChild);
        }
    }
}

Program output: Tree: SELECT one , two , three SELECT one , two , three

Now, if I alter the grammar file:

sql :   VERB ^fields;

.. the call to ParseSubTree only displays the child nodes of fields.

Program output: Tree: (SELECT one , two , three) one , two , three

My question is: why, in the second case, is Antlr just give the child nodes? (in effect missing out the SELECT token) I'd be very grateful if anybody can give me any pointers for making sense of the tree returned by Antlr.

Useful Information: AntlrWorks 1.4.2, Antlr C Target 3.3, MSVC 10

2

2 Answers

2
votes

Placing output=AST; in the options section will not produce an actual AST, it only causes ANTLR to create CommonTree tokens instead of CommonTokens (or, in your case, the equivalent C structs).

If you use output=AST;, the next step is to put tree operators, or rewrite rules inside your parser rules that give shape to your AST.

See this previous Q&A to find out how to create a proper AST.

For example, the following grammar (with rewrite rules):

options {
  output=AST;
  // ...
}

sql                        // make VERB the root
  :  VERB fields        -> ^(VERB fields) 
  ;

fields                     // omit the comma's from the AST
  :  FIELD (',' FIELD)* -> FIELD+
  ;

VERB  : 'SELECT' | 'UPDATE' | 'INSERT';
FIELD : CHAR+;
SPACE : ' ' {$channel=HIDDEN;};
fragment CHAR : 'a'..'z';

will parse the following input:

UPDATE         field,     foo  ,  bar

into the following AST:

enter image description here

2
votes

I think it is important that you realize that the tree you see in Antrlworks is not the AST. The ".tree" in your code is the AST but may look different from what you expect. In order to create the AST, you need to specify the nodes using the ^ symbol in strategic places using rewrite rules.

You can read more here