1
votes

Background

I am loading files from local machine to BigQuery.Each file has variable number of fields.So,i am using 'autodetect=true' while running load job.

Issue is,when load job is run for first time and if the destination table doesn't exsist,Bigquery creates the table ,by infering the fields present in our file and that becomes New table's schema.

Now,when i run load job with a different file,which contains some extra (Eg:"Middile Name":"xyz")fields ,bigQuery throws error saying "field doesn't exsist in table")

From this post::BigQuery : add new column to existing tables using python BQ API,i learnt that columns can be added dynamically.However what i don't understand is,

Query

How will my program come to know,that the file being uploaded ,contains extra fields and schema mismatch will occur.(Not a problem ,if table doesn't exsist bcoz. new table will be created).

If my program can somehow infer the extra fields present in file being uploaded,i could add those columns to the exsisting table and then run the load job.

I am using python BQ API.

Any thoughts on how to automate this process ,would be helpful.

2

2 Answers

3
votes

You should check schema update options. There is an option named as "ALLOW_FIELD_ADDITION" that will help you.

0
votes

A naive solution would be:

1.get the target table schema using

service.tables().get(projectId=projectId, datasetId=datasetId, tableId=tableId)

2.Generate schema of your data in the file.

3.Compare the schemas (kind of a "diff") and then add those columns to the target table ,which are extra in your data schema

Any better ideas or approaches would be highly appreciated!