1
votes

We're building a data warehouse in BigQuery where we generate a large amount of data marts using standard sql statements. These can be quite large and complex. To track data lineage across a chain of dependencies, we'd like to automatically parse the SQL statements and get all the output columns, matched up with the input table.column(s).

Simple example:

SELECT t1.a, t2.b, t1.a + t2.b AS c FROM table1 t1 JOIN table2 t2 ON t1.a = t2.a

Should end up giving us:

Input Output table1.a a table2.b b table1.a c table1.b c

We've tried using this: https://www.npmjs.com/package/node-sql-parser, but it comes up short in some of our complex scenarios.

Is there any library available in any language which supports parsing a SQL statement and returning the AST for the full standard SQL grammar?

1

1 Answers

0
votes

You can use google/zetasql, which is what BigQuery uses for parsing StandardSQL.