I'm trying to write a Merge query in Google BigQuery (part of an ETL process).
I have Source (staging) and Target tables and I have 2 ways of merge the data: the classic 'Upsert' Merge OR Insert new row if not matched all columns.
This is an example of the first way (the classic 'Upsert') query:
MERGE DS.Target T
USING DS.Source S
ON T.Key=S.Key
WHEN NOT MATCHED THEN
INSERT ROW
WHEN MATCHED THEN
UPDATE SET Col1 = S.Col1, Col2 = S.Col2
in that way if the key exist it always updates the values of the cols even if value are the same. also this will work only if the key is not Nullable.
The other way of doing it is to inserting new row when values not matched:
MERGE DS.Target T
USING DS.Source S
ON T.A = S.A and T.B = S.B and T.C = S.C
WHEN NOT MATCHED THEN
INSERT ROW
I prefer this way, BUT I found that its not possible when column type is NULL, because NULL != NULL and then the condition is false when values are Null.
I can't find a proper way of writing this query and handle Nulls comparison.
It's not possible to check for Nulls at the merge condition, Ex:
ON ((T.A IS NULL and S.A IS NULL) or T.A = S.A)
WHEN NOT MATCHED THEN
INSERT ROW
Error message:
RIGHT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
It's not possible also to use the Target table reference at the WHERE
clause, Ex:
ON T.A = S.A
WHEN NOT MATCHED AND
S.A IS NOT NULL AND T.A IS NOT NULL
THEN
INSERT ROW
What do you suggest? Also, lets say both ways are possible, what would be more cost effective by BQ? I guess the performance should be the same. I also assume that I can ignore the insertions cost. Thanks!
coalesce()
?coalesce(T.A, '') = coalesce(S.A, '')
. Choose the default value appropriate for your data and the data type. I also wonder if this would work:coalesce( x = y, x is null and y is null ) is true
. - Gordon Linoff