1
votes

Related question: Bigquery add columns to table schema using BQ command line tools

I want to add a new column to existing tables (update the existing table's schema) in BigQuery using BigQuery Python API.

However my code seems not working.

Here's my code:

    flow = flow_from_clientsecrets('secret_key_path', scope='my_scope')
    storage = Storage('CREDENTIAL_PATH')
    credentials = storage.get()
    if credentials is None or credentials.invalid:
        credentials = tools.run_flow(flow, storage, tools.argparser.parse_args([]))
    http = httplib2.Http()
    http = credentials.authorize(http)
    bigquery_service = build('bigquery', 'v2', http=http)
    tbObject = bigquery_service.tables()
    query_body = {'schema': {'name':'new_column_name', 'type':'STRING'}}
    tbObject.update(projectId='projectId', datasetId='datasetId', tableId='tableId', body=query_body).execute()

it returns Provided schema doesn't match existing table's schema error. Can anyone give me a working Python example? Many thanks!

2
you should provide whole schema, not just new fieldMikhail Berlyant
Hi Mikhail, do you mean the whole schema of existing table?Charles Chow
yes, with new field. so api will be responsible for adding new one. but for this to happen the whole schema needs to be presentedMikhail Berlyant
Hi Mikhail, I have one more question, I know there's no SET and UPDATE methods for BQ, just wondering if I add a new column to existing table, how canI set the value of the new column for existing tables? Or I have to drop the existing table and create a new table with column values?Charles Chow
Ah I see, I'll try it again with whole schemaCharles Chow

2 Answers

2
votes

summary of my comments (as i've got some minutes now for this):

  • whole schema (along with new field) needs to be supplied to api
  • new field will be added with null for existing rows. no way to set value
  • you can have some logic in queries that you will be running against this table to compensate this. or you can have separate table with just this new field and some key that you will be joining your existing table with new table to get this field
4
votes

Base on Mikhail Berlyant comments, I have to pass existing table's schema with new field (column) to the update() method to update the existing tables's schema.

A python code example is given below:

...
tbObject = bigquery_service.tables()
# get current table schema
table_data = tbObject.get(projectId=projectId, datasetId=datasetId, tableId=tableId).execute()
schema = table_data.get('schema')
new_column = {'name': 'new_column_name', 'type': 'STRING'}
# append new field to current table's schema
schema.get('fields').append(new_column)
query_body = {'schema': schema}
tbObject.update(projectId='projectId', datasetId='datasetId', tableId='tableId', body=query_body).execute()

And also, there's no way to set value of new columns for existing rows (tables). Thanks for Mikhail Berlyant suggestion, the way to set the value for existing rows is to create a seperate table for new columns with values, and join the existing table with that table to replace the old schema table