1
votes

I'm working on BigQuery tables with the Python SDK and I want to achieve something that seems doable, but can't find anything in the documentation.

I have a table T partitioned by date, and I have a SELECT request that computes values over the X last days. In T, I would like to replace the partitions of the X last days with these values, without affecting the partitions older than X days.

Here is how we do for replacing one partition only :

job_config = bigquery.QueryJobConfig()
job_config.destination = dataset.table("{}${}".format(table, date.strftime("%Y%m%d")))
job_config.use_legacy_sql = False
job_config.write_disposition = bigquery.job.WriteDisposition.WRITE_TRUNCATE

query_job = bigquery.job.QueryJob(str(uuid.uuid4()), query, client, job_config)
query_job.result()

I tried to go like this :

 job_config.destination = dataset.table(table))

But it truncates all partitions, even those older than X days.

Is there a way to do this easily ? Or do I have to loop over each partition of the interval ?

Thanks

1

1 Answers

1
votes

I don't think you can achieve it by playing with destination table.

Not considering the cost, what you can do with SQL is

DELETE FROM your_ds.your_table WHERE partition_date > DATE_SUB(CURRENT_DATE(), INTERVAL X DAY);

Then

INSERT INTO your_ds.your_table SELECT (...)

Cost

The first DELETE will cost:

The sum of bytes processed for all the columns referenced in all partitions for the tables scanned by the query + the sum of bytes for all columns in the modified or scanned partitions for the table being modified (at the time the DELETE starts).

The second INSERT INTO should cost the same as your current query.