7
votes

I have a Day-Partitioned Table on BigQuery. When I try to delete some rows from the table using a query like:

DELETE FROM `MY_DATASET.partitioned_table` WHERE id = 2374180

I get the following error:

Error: DML statements are not yet supported over partitioned tables.

A quick Google search leads me to: https://cloud.google.com/bigquery/docs/loading-data-sql-dml where it also says: "DML statements that modify partitioned tables are not yet supported."

So for now, is there a workaround that we can use in deleting rows from a partitioned table?

4

4 Answers

5
votes

DML has some known issues/limitation in this phase.

Such as:

  • DML statements cannot be used to modify tables with REQUIRED fields in their schema.
  • Each DML statement initiates an implicit transaction, which means that changes made by the statement are automatically committed at the end of each successful DML statement. There is no support for multi-statement transactions.
  • The following combinations of DML statements are allowed to run concurrently on a table: UPDATE and INSERT
    DELETE and INSERT
    INSERT and INSERT
    Otherwise one of the DML statements will be aborted. For example, if two UPDATE statements execute simultaneously against the table then only one of them will succeed.
  • Tables that have been written to recently via BigQuery Streaming (tabledata.insertall) cannot be modified using UPDATE or DELETE statements. To check if the table has a streaming buffer, check the tables.get response for a section named streamingBuffer. If it is absent, the table can be modified using UPDATE or DELETE statements.
  • DML statements that modify partitioned tables are not yet supported.

Also be aware of the quota limits

  • Maximum UPDATE/DELETE statements per day per table: 48
  • Maximum UPDATE/DELETE statements per day per project: 500
  • Maximum INSERT statements per day per table: 1,000
  • Maximum INSERT statements per day per project: 10,000

What you can do is copy the entire partition to a non-partitioned table and execute the DML statement there. Than write back the temp table to the partition. Also if you ran into DML update limit statements per day per table, you need to create a copy of the table and run the DML on the new table to avoid the limit.

2
votes

You could delete partitions in partitioned tables using the command-line bq rm, like this:

bq rm 'mydataset.mytable$20160301'
1
votes

I've already done it without temporary table, steps:

1) prepare query which selects all the rows from particular partition which should be kept:

SELECT * FROM `your_data_set.tablename` WHERE 
_PARTITIONTIME = timestamp('2017-12-07') 
AND condition_to_keep_rows_which_shouldn't_be_deleted = 'condition' 

if necessary run this for other partitions

2) choose Destination table for result of your query where you point TO THE PARTICULAR PARTITION, you need to provide table name like this:

tablename$20171207

3) Check option "Overwrite table" -> it will overwrite only particular partition

4) Run Query, as a result from pointed partition redundant rows will be deleted!

//remember that you could need run this for other partitions, where you rows to deleted are spread across more than one partition

0
votes

Looks like as of my writing, this is no longer a BigQuery limitation!

In standard SQL, a statement like the above, over a partitioned table, will succeed, assuming rows being deleted weren't recently (within last 30 minutes) inserted via a streaming insert.

Current docs on DML: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-manipulation-language

Example Query that worked for me in the BQ UI:

DELETE 
FROM dataset_name.partitioned_table_on_timestamp_column
WHERE 
timestamp >= '2020-02-01' AND timestamp < '2020-06-01'

After the hamsters are done spinning, we get the BQ response:

This statement removed 101 rows from partitioned_table_on_timestamp_column