When updating a view, indirect schema changes do not seem to be picked up.
Steps to reproduce
- Create
view1
with a fieldfield1
(e.g.SELECT 1 AS field1
) - Create
view2
selecting all fields fromview1
- Update
view1
to also includefield2
(e.g.SELECT 1 AS field1, 2 AS field2
) - Update
view2
with the same query as before (due to documented limitation)
Desired outcome
- Schema for
view1
andview2
includingfield1
andfield2
- View updates should be atomic
Actual outcome
- Schema for
view1
correctly updated (includingfield1
andfield2
) - Schema for
view2
is only includingfield1
- Selecting from
view2
does actually returnfield1
andfield2
I could delete view2
and re-create it again but that wouldn't be atomic and there would be a time where the view is not available which is undesired.
I also tried to update the schema property of view2
but that is refused with Cannot add fields (field: field2)
:
google.api_core.exceptions.BadRequest: 400 PATCH https://www.googleapis.com/bigquery/v2/projects/<project-id>/datasets/dataset1/tables/view2: Provided Schema does not match Table <project-id>:dataset1.view2. Cannot add fields (field: field2)
Question
Is there any way to update the view atomically while also updating the schema which was changed indirectly (a table/view the view selects from).
Note: of course my view2 would add additional fields and I could currently determine its schema by creating a new temporary view.
Note: the schema is important because tools such as the Data Studio's BigQuery connector are inspecting the schema.
Code to reproduce the steps
# Python 3.6+
import google.api_core.exceptions
from google.cloud import bigquery
def delete_table_if_exists(client: bigquery.Client, table: bigquery.Table):
try:
client.delete_table(table)
except google.api_core.exceptions.NotFound:
pass
def full_table_id(table: bigquery.Table) -> str:
# Note: the documentation says it should be separated by a dot but uses a colon
return table.full_table_id.replace(':', '.')
def view_test():
client = bigquery.Client()
dataset_ref = client.dataset('dataset1')
try:
client.create_dataset(dataset_ref)
except google.api_core.exceptions.Conflict:
pass
view1 = bigquery.Table(dataset_ref.table('view1'))
view2 = bigquery.Table(dataset_ref.table('view2'))
delete_table_if_exists(client, view1)
delete_table_if_exists(client, view2)
view1.view_query = 'SELECT 1 AS field1'
view1 = client.create_table(view1)
view2.view_query = f'SELECT * FROM `{full_table_id(view1)}`'
client.create_table(view2)
view1.view_query = 'SELECT 1 AS field1, 2 AS field2'
client.update_table(view1, ['view_query'])
client.update_table(view2, ['view_query'])
print('view2 schema:', client.get_table(view2).schema)
# trying to update the schema fails with 'Cannot add fields (field: field2)'
view2.schema = client.get_table(view1).schema
client.update_table(view2, ['schema'])
if __name__ == '__main__':
view_test()
Bash example doing the same
#!/bin/bash
set -e
project_id=$(gcloud config list --format 'value(core.project)' 2>/dev/null)
bq mk -f dataset1
bq rm -f dataset1.view1
bq rm -f dataset1.view2
bq mk --use_legacy_sql=false --view 'SELECT 1 AS field1' dataset1.view1
bq mk --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2
bq update --use_legacy_sql=false --view 'SELECT 1 AS field1, 2 AS field2' dataset1.view1
bq update --use_legacy_sql=false --view 'SELECT * FROM `'$project_id'.dataset1.view1`' dataset1.view2
bq show dataset1.view2
Update: Code with accepted answer
Python code
def get_create_or_replace_view_query(view: bigquery.Table) -> str:
return f'CREATE OR REPLACE VIEW {view.dataset_id}.{view.table_id} AS {view.view_query}'
def view_test():
# ...
query_job = client.query(get_create_or_replace_view_query(view2))
query_job.result()
print('view2 schema:', client.get_table(view2).schema)
Bash magic
bq query --use_legacy_sql=false 'CREATE OR REPLACE VIEW dataset1.view2 AS SELECT * FROM `'$project_id'.dataset1.view1`'
update_table
method seems to be using thePATCH
HTTP method. I've added a more detailed exception which shows that (when I try to update the schema). – de1