1
votes

I am struggling to understand how this SQL Grammar ever can be used to resolve the SQL statement presented below from a parser. I am stuck at the 'Table reference' and 'join' construction´s found after when WHERE token.

BNF: http://www.contrib.andrew.cmu.edu/~shadow/sql/sql2bnf.aug92.txt

<table reference> ::=
         <table name> [ <correlation specification> ]
     | <derived table> <correlation specification>
     | <joined table>

 <joined table>    ::=
         <cross join>
     | <qualified join>
     | <left paren> <joined table> <right paren>

<cross join>    ::=
         <table reference> CROSS JOIN <table reference>

<qualified join>    ::=
         <table reference> [ NATURAL ] [ <join type> ] JOIN <table reference> [ <join specification> ]

<join type>    ::=
         INNER
     | <outer join type> [ OUTER ]
     | UNION

<outer join type>    ::=   LEFT | RIGHT | FULL

<join specification>    ::=   <join condition> | <named columns join>

<join condition>    ::=   ON <search condition>

<named columns join>    ::=   USING <left paren> <join column list> <right paren>

SQL:

SELECT p.Name, v.Name
FROM Production.Product p
JOIN Purchasing.ProductVendor pv
ON p.ProductID = pv.ProductID
JOIN Purchasing.Vendor v
ON pv.BusinessEntityID = v.BusinessEntityID
WHERE ProductSubcategoryID = 15
ORDER BY v.Name; 

Jump to the FROM clause. Here you have one TableName, which is followed by two JOINs.

If you have a look at 'Table reference' then you see that this contains 'Table name'. This I can comprehend.

<table reference>    ::=
         **<table name> [ <correlation specification> ]**
     | <derived table> <correlation specification>
     | <joined table>

But to get the join the parser must reach the 'Joined table' which it can't because it all ready read the 'table name'.

To reach the join the parser must reach 'Qualified join', but it can't because there's no repetition in the BNF. If it somehow reached the 'Join table' element then if would be quite disappointed because the next read would be 'Table reference' again and then it would reach 'Qualifed join' again, and.... then you get a stack overflow.

  <joined table>    ::=
         <cross join>
     | <**qualified join>**
     | <left paren> <joined table> <right paren>

**<qualified join>**    ::=
         <table reference> [ NATURAL ] [ <join type> ] JOIN <table reference> [ <join specification> ]

What am I not getting here? I'm sure there's a trick to it but I just don't see it.

I hope that some of you can explain to me what going in on with this to me impossible grammar.

How can one ever construct a let's say a recursive descent parser to solve this Grammar?

What steps and/or rules does the parser need to follow?

2
I'm not very familiar with what you're doing but... what I'm reading is that <table name> represents a query with only one table in it, whereas <joined table> represents a query with joined tables in it. But how you apply the logic to recognise this beforehand I'm not sure.Nick.McDermaid
Exactly. How do you tell a recursiv decent paser to select the correct path? Another problem for me is also how the BNF supports nested joins? It's probably simple with the right explanation.Brian Andersen
Why do you believe that the grammar must be parsable with a recursive descent parser?rici
I have no idea if it is or it isn't possible to parse this grammar with a recursive decent parser? I myself is trying to implement it with such a parser.Brian Andersen
I don't quite understand: is this a proven parser that you are trying to understand or are you building the parser yourself? in other words is the solution to this a better understanding of the BNF or is the solution a change to the BNF or is it something else? @rici makes a good point - I don't think you can parse SQL with a serial approach, you might need to identify components from a high level and drill in on those piecesNick.McDermaid

2 Answers

1
votes

That grammar is not LL(1), which is what you need to build a recursive-descent parser. I doubt whether there is an LL(1) grammar for SQL, and particularly if there is one which can generate a correct parse tree. Fortunately, that doesn't matter, since there are more powerful parsing technologies.

It is likely that you could use that grammar to build an LALR(1) parser, using a tool like bison/yacc. Or see the sqlite source code, which includes both an LALR(1) grammar and an LALR(1) parser generator called "lemon".

0
votes

It's been a while since this question was asked but i'm thinking about rewriting my ANTLR4 SQL parser into RDP due to performance/memory issues with ANTLR.

So, the way i (mentally) solve the issue of JOINs is that i treat them like operators, because that's what they are (at least that's my impression).

Basic expression grammar suits both usual expressions and JOINs:

ex
    : '(' ex ')'
    | ex OPERATOR ex
    | TERMINAL
    ;

In the JOIN case your OPERATOR becomes JOIN and TERMINAL becomes table reference.

Now, this kind of left-recursive grammar isn't an LL grammar but it can be transformed into one, which ANTLR4 does automatically.