0
votes

I have a main table and a staging table in GCP Bigquery. The main table will have the snapshot data and the staging table will be a streaming table. I want to run merge query and update the data in the main table at a certain intervals.

As I will be using the MongoDB Debezium connector, I will only be receiving updated columns for a row and the rest columns will be null.

The normal merge query will update the entire row which is undesirable.

I need an approach to merge the rows from the tables only for those specific columns and keep the rest of the columns untouched. The columns to be updated can be different for each row.

For eg :

Main table data:
id  status  login_id    task_id user_id
71  CLAIMED 13          4373737 2191


Staging table data:
id  status  login_id    task_id user_id
71  null    null        4636282 null

I want result something like this:

id  status  login_id    task_id user_id
71  CLAIMED 13          4636282 2191

The query must insert a record for a new key and update a specific column for the existing key.

Is it possible? Anyone can help me with that?

1
does my answer resolve your question? - AnkurSaxena

1 Answers

1
votes

Below query should work. Instead of a single clause to update entire row on WHEN MATCHED, you'll need to create one match clause per column. WHEN NOT MATCHED BY TARGET portion stays the same to insert new entry.

merge main T 
using staging S
on T.id = S.id
when matched and S.status is not null then update set T.status = cast(S.status as string)
when matched and S.login_id is not null then update set T.login_id = S.login_id
when matched and S.task_id is not null then update set T.task_id = S.task_id
when matched and S.user_id is not null then update set T.user_id = S.user_id
when not matched by target then insert values (S.id, cast(S.status as string), S.login_id, S.task_id, S.user_id)

I tested it and it works. You may need to cast some columns as bigquery will complain if S has null value and T is int for a column.