I have a set of historical data tables that have different but 'compatible' schemas. What I mean is that over time, the schema has evolved and more fields got added (not removed). I like to run a query over the union of all.
Doing so using legacy SQL
requires multiple SELECT
statements joined by UNION
and dummy NULL placeholders for new fields that did not exist in earlier tables.
I have 100 of those combo queries to build & execute.
I could script this but was hoping using Standard SQL
wildcard tables instead. I read however at https://cloud.google.com/bigquery/docs/reference/legacy-sql#tablewildcardfunctions :
In order to execute a standard SQL query that uses a wildcard table, BigQuery automatically infers the schema for that table. BigQuery uses the schema for the most recently created table that matches the wildcard as the schema for the wildcard table. If the schema is inconsistent across tables matched by the wildcard table, BigQuery returns an error.
Not sure how strict is meant inconsistent. Would it support incremental schema as per above use case or do all tables need to share the exact same schema ?
If not possible, another alternative would be to (re)load from source into a new table using the schemaUpdateOptions ALLOW_FIELD_ADDITION
. It means redesign and rework.
Any other suggestion is welcome. FYI - I use Airflow to execute the queries so I rather use a Pythonic solution.