1
votes

I have this BigQuery table with three columns and every row can have same values of the previous one.

For example:

| col_a | col_b | col_c
+-------+-------+------------
| 123   |   3   | 2019-12-12
| 123   |   3   | 2019-12-12
| 234   |  11   | 2019-10-12

Now I want to add new column named col_d with a UUID in it.

The problem is that when I try to execute UPDATE command I have no way to update only one row at a time (because some rows have the same values and I want different UUID in each one of them).

Things I tried with no luck :(

LIMIT

UPDATE table 
SET col_d = GENERATE_UUID() 
LIMIT 1

I thought to get all rows and then traverse them with an update command. But there's not LIMIT on UPDATE commands in BigQuery.

ROW_NUMBER

UPDATE table 
SET col_d = ROW_NUMBER() OVER()

But BigQuery doesn't allow to use analytic function in Update command

INSERT

I can query all rows, and insert them with a UUID and then delete all the old ones that has no UUID. that approach will work and it will be my final gateaway but I believe there's a better way so I'm asking here.

Any other idea or advice will be welcome.

3

3 Answers

3
votes

Below is for BigQuery Standard SQL and produces different UUID for each and every row no matter how duplicate they are

UPDATE `project.dataset.table`
SET col_d = GENERATE_UUID()
WHERE TRUE   

Note: based on your "Insert and then Delete" option - I assume that col_d already exists in your table - otherwise you would not be able to do DELETE FROM table WHERE col_d IS NULL as you mentioned in your comments

1
votes

You can SELECT the data with a UUID as a fourth column (col_d) and then save that data as a new table.

SELECT col_ac, col_b, col_c, GENERATE_UUID() AS col_d
FROM table

This will generate the output you desire:

| col_a | col_b | col_c       | col_d
+-------+-------+-------------+------------------------------------------
| 123   |   3   | 2019-12-12  | e3784e4d-59bb-433b-a9ac-3df318e0f675
| 123   |   3   | 2019-12-12  | 430d034a-6292-4f5e-b1b0-0ee5550af3f6
| 234   |  11   | 2019-10-12  | 3e7e14d2-3077-4030-a704-5a2b7fc3c11e

Since BigQuery does not allow adding a column with data like traditional SQL the following should create a new table with the UUID values added.

CREATE OR REPLACE TABLE table AS 
SELECT *, GENERATE_UUID() AS col_d
FROM table

Be warned that the table history may be deleted so back it up first. One should always backup data prior to doing such updates in all cases as undesired outcomes do arise.

0
votes

because some rows have the same values and I want different UUID in each one of them

This should do what you want:

UPDATE table 
    SET col_d = GENERATE_UUID() 

I don't understand why you would be using limit, if you want to update all rows.

That said, BigQuery has restrictions on UPDATEs, so the CREATE TABLE approach suggested by fromthehills seems more appropriate.