I am using the BigQuery client libraries to perform a data ETL jpb, then load data back into BigQuery.
I'd like to overwrite the destination table every time, but currently my code is appending new data to the table every time it is run. I've read the documentation on job_config, and I have used this to set parameters for queries, but I can't figure out how to set a write disposition for the query.
Here is what I have tried so far:
roc_df = pd.DataFrame(roc_score)
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
dataset_ref = client.dataset('Customers')
table_ref = dataset_ref.table('propensity_scores_test')
client.load_table_from_dataframe(roc_df, table_ref, job_config=job_config).result()
And I also tried this format:
query_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.job.WriteDisposition('WRITE_TRUNCATE')
]
)
But both are currently returning the error:
BadRequest: 400 POST https://www.googleapis.com/upload/bigquery/v2/projects/my_project/jobs?uploadType=resumable: Required parameter is missing
Hoe can I write my data out and replace the table each time?
train = client.query(training_query).to_dataframe()
without any config set - I only need to use a config later, to set the write rules, perhaps this is confusing things? – Ben P