4
votes

I am using ANTLR 3 to do the below.

Assume I have an SQL query. I know that in general it's WHERE, ORDER BY and GROUP BY clauses are optional. In terms of ANTLR's grammar I would describe that like this:

query : select_clause from_clause where_clause? group_by_clause? order_by_clause?

The rule for each clause will obviously start with the respective keyword.

What I actually need is to extract each clause's contents as a string without dealing with its internal structure.

To do this I started with the following grammar:

query :
    select_clause from_clause where_clause? group_by_clause? order_by_clause?
EOF;

select_clause :
    SELECT_CLAUSE
;

from_clause :
    FROM_CLAUSE
;

where_clause :
    WHERE_CLAUSE
;

group_by_clause :
    GROUP_BY_CLAUSE
;

order_by_clause :
    ORDER_BY_CLAUSE
;

SELECT_CLAUSE : 'select' ANY_CHAR*;

FROM_CLAUSE : 'from' ANY_CHAR*;

WHERE_CLAUSE : 'where' ANY_CHAR*;

GROUP_BY_CLAUSE : 'group by' ANY_CHAR*;

ORDER_BY_CLAUSE : 'order by' ANY_CHAR*;

ANY_CHAR : .;

WS : ' '+ {skip();};

This one didn't work. I have had further attempts composing a correct grammar with no success. I suspect this task is doable with ANTLR3 but I am just missing smth.

More generally, I would like to be able to collect chars from the input stream into a single token until meeting a specific keyword that would indicate the beginning of a new token. This keyword should be the part of the new token.

Can you help me please?

1

1 Answers

2
votes

Instead of adding them to your tokens, why not move the ANY_CHAR* into parser rules instead? You could even "glue" these single tokens together using a rewrite rule.

A quick demo:

grammar T;

options { output=AST; }
tokens  { QUERY; ANY; }

query           : select_clause from_clause where_clause? group_by_clause? order_by_clause? EOF
                  -> ^(QUERY select_clause from_clause where_clause? group_by_clause? order_by_clause?)
                ;
select_clause   : SELECT_CLAUSE^ any;
from_clause     : FROM_CLAUSE^ any;
where_clause    : WHERE_CLAUSE^ any;
group_by_clause : GROUP_BY_CLAUSE^ any;
order_by_clause : ORDER_BY_CLAUSE^ any;
any             : ANY_CHAR* -> ANY[$text];

SELECT_CLAUSE   : 'select';
FROM_CLAUSE     : 'from';
WHERE_CLAUSE    : 'where';
GROUP_BY_CLAUSE : 'group' S+ 'by';
ORDER_BY_CLAUSE : 'order' S+ 'by';
ANY_CHAR        : . ;
WS              : S+ {skip();};

fragment S      : ' ' | '\t' | '\r' | '\n';

If you now parse the input:

select JUST ABOUT ANYTHING from YOUR BASEMENT order by WHATEVER

the following AST would be created:

enter image description here

Trying to do something similar in your lexer would be messy, and would mean some custom code (or predicates) to check for keywords up ahead in the char-stream (both not pretty!).