I need to do a merge statement in BigQuery using a classic flat table, having as target a table with nested and repeated fields, and I'm having trouble understanding how this is supposed to work. Google's examples use direct values, so the syntax here is not really clear to me.
Using this example:
CREATE OR REPLACE TABLE
mydataset.DIM_PERSONA (
IdPersona STRING,
Status STRING,
Properties ARRAY<STRUCT<
Id STRING,
Value STRING,
_loadingDate TIMESTAMP,
_lastModifiedDate TIMESTAMP
>>,
_loadingDate TIMESTAMP NOT NULL,
_lastModifiedDate TIMESTAMP
);
INSERT INTO mydataset.DIM_PERSONA
values
('A', 'KO', [('FamilyMembers', '2', CURRENT_TIMESTAMP(), TIMESTAMP(NULL))], CURRENT_TIMESTAMP(), TIMESTAMP(NULL)),
('B', 'KO', [('FamilyMembers', '4', CURRENT_TIMESTAMP(), TIMESTAMP(NULL)),('Pets', '1', CURRENT_TIMESTAMP(), NULL)], CURRENT_TIMESTAMP(), TIMESTAMP(NULL))
;
CREATE OR REPLACE TABLE
mydataset.PERSONA (
IdPersona STRING,
Status STRING,
IdProperty STRING,
Value STRING
);
INSERT INTO mydataset.PERSONA
VALUES('A', 'OK','Pets','3'),('B', 'OK','FamilyMembers','5'),('C', 'OK','Pets','2')
The goal is to:
- Update IdPersona='A', adding a new element in Properties and changing Status
- Update IdPersona='B', updating the existent element in Properties
- Insert IdPersona='C'
This INSERT works:
MERGE INTO mydataset.DIM_PERSONA TRG
USING (
SELECT
IdPersona,
Status,
ARRAY(
SELECT AS STRUCT
IdProperty,
Value,
CURRENT_TIMESTAMP(),
TIMESTAMP(NULL)
) Properties,
CURRENT_TIMESTAMP(),
TIMESTAMP(NULL)
FROM mydataset.PERSONA
) SRC ON TRG.IdPersona=SRC.IdPersona
WHEN NOT MATCHED THEN
INSERT VALUES (IdPersona, Status, Properties, CURRENT_TIMESTAMP(), TIMESTAMP(NULL))
But I would like to build the nested/repeated fields in the INSERT clause, because for the UPDATE I would also need (I think) to do a "SELECT AS STRUCT * REPLACE" by comparing the values of TRG with SRC. This doesn't work:
MERGE INTO mydataset.DIM_PERSONA TRG
USING (
SELECT
*
FROM mydataset.PERSONA
) SRC ON TRG.IdPersona=SRC.IdPersona
WHEN NOT MATCHED THEN
INSERT VALUES (
IdPersona,
Status,
ARRAY(
SELECT AS STRUCT
IdProperty,
Value,
CURRENT_TIMESTAMP(),
TIMESTAMP(NULL)
),
CURRENT_TIMESTAMP(),
TIMESTAMP(NULL)
)
I get "Correlated Subquery is unsupported in INSERT clause."
Even if I used the first option, I don't get how to reference TRG.properties in the UPDATE..
WHEN MATCHED THEN
UPDATE
SET Properties = ARRAY(
SELECT AS STRUCT p_SRC.*
REPLACE (IF(p_SRC.IdProperty=p_TRG.id AND p_SRC.Value<>p_TRG.Value,p_SRC.Value,p_TRG.Value) AS Value)
FROM SRC.Properties p_SRC, TRG.Properties p_TRG
)
Obv this is wrong though.
One way to solve this, as I see it, is to pre-join everything in the USING clause, therefore doing all the replacement there, but it feels very wrong for a merge statement.
Can anyone help me figure this out, please? :\