2
votes

We are writing code to 'augment' our BigQuery tables with additional metadata in an automated way by using the Python API

We do this by using the key-value labeling system (works fine), and updating the column descriptions by:

  • Per table, iterate over the schema columns
  • Do something with the column metadata
  • Add the column to a new schema
  • Update the schema

However, the last step gives an error when working with tables that having RECORD type fields. When trying to update the schema I receive the following error on the 1st nested field inside the RECORD field:

SchemaField(u'nestedField', u'integer', u'NULLABLE', None, ()) is not JSON serializable

The code looks like this:

orig_table = bigquery_client.get_table(table)
schema = list(orig_table.schema)

new_schema = list()

for column in schema:
        columntxt = *do something to define the column metadata* 

        new_schema.append(bigquery.SchemaField(column.name, column.field_type, column.fields,  description=columntxt ))

orig_table.schema = new_schema
table = bigquery_client.update_table(orig_table, ['schema', 'labels'])

How to properly resolve this / make the code work with nested schema tables?

1

1 Answers

2
votes

The way you built your request is sending wrong parameters to SchemaField constructor (which in turn can't be encoded into JSON).

Try this instead:

for column in schema:
    columntxt = *do something to define the column metadata* 
    new_schema.append(bigquery.SchemaField(
        column.name,
        column.field_type,
        fields=column.fields,
        description=columntxt,
        mode=column.mode))

orig_table.schema = new_schema
table = bigquery_client.update_table(orig_table, ['schema'])