In BigQuery I want to update a column in one of my tables and it seems that doing so costs as much as if I did an FTS (full table scan) on the table.
let's say I have a 1TB table employee
having two columns: first_name, last_name
(512GB each) and I want to set the firs_name of all the employees to 'john', like below
UPDATE my_dataset.employee
SET first_name = 'john'
WHERE TRUE IS TRUE # mandatory WHERE clause
How much does this cost ?
from this documentation link (relevant part below)
DML statement 1: Updating a single partition
mytable2 has two columns: id of type INTEGER and ts of type TIMESTAMP. mytable has two columns: field1 of type INTEGER and field2 of type STRING.
UPDATE project.mydataset.mytable T
SET T.field1 = T.field1 + 100
WHERE T._PARTITIONTIME = TIMESTAMP(“2017-05-01”)
AND EXISTS (SELECT S.id
FROM project.mydataset.mytable2 S
WHERE S.id = T.field1)
Bytes processed in this example =
sum of the number of bytes in mytable2.id + sum of the number of bytes in mytable.field1 in the "2017-05-01" partition + sum of the number of bytes in mytable.field2 in the "2017-05-01" partition
We have that updating one column project.mydataset.mytable
for one partition costs as much as reading all the columns (SELECT *) for the same partition. Why is that ?
I've tried the same query both in the UI and the CLI, the price (bytes processed) doesn't change it's equivalent to a (SELECT *) for the same partitions
bq query --dry_run '<my query>'