snowflake merge statement using golden gate json as source table

0

votes

while executing target table in snowflake using json data as source table

merge into cust tgt using (
select parse_json(s.$1):application_num as application num 
from prd_json s qualify 
row_number() over(partition application  
order_by application desc)=1) src 
on tgt.application =src.application 
when not matched and op_type='I' then 
insert(application) values (src.application );

qualify commands ignores all the duplicate data present and gives only unique record but while putting joins its show only less records when compare to normal select statement. for example :

select distinct application  
from prd_json  where op_type='I';

--15000 rows are there while putting joins it shows there is not matching records in target . if it is not matched it should insert all 15000rows but 8500 rows only inserting even though it was not an duplicate record . is there any function available without using "qualify" shall we insert the record. if i ignore qualify am getting dml error duplication. pls guide me if anyone knows.

snowflake-cloud-data-platform

you SQL seems invalid: SELECT '1' as application num; feel like it should be as application_num but everything else refers to application But then how does your distinct SQL work? - Simeon Pilgrim

0

votes

How about using SELECT DISTINCT?

0

votes

You demo SQL does not compile. and you using the $1 means it's also hard to guess the names of your columns to know how the ROW_NUMBER is working. So it's hard to nail down the problem.

But with the following SQL you can replace ROW_NUMBER with DISTINCT

CREATE TABLE cust(application INT);

CREATE OR REPLACE table prd_json as 
    SELECT parse_json(column1) as application, column2 as op_type
    FROM VALUES 
    ('{"application_num":1,"other":1}', 'I'),
    ('{"application_num":1,"other":2}', 'I'),
    ('{"application_num":2,"other":3}', 'I'),
    ('{"application_num":1,"other":1}', 'U')
;

MERGE INTO cust AS tgt 
USING (
    SELECT DISTINCT  
        parse_json(s.$1):application_num::int as application,
        s.op_type
    FROM prd_json AS s 
) AS src 
    ON tgt.application = src.application 
WHEN NOT MATCHED AND src.op_type = 'I' THEN 
    INSERT(application) VALUES (src.application );

number of rows inserted
2

SELECT * FROM cust;

APPLICATION
1
2

running the MERGE code a second time gives:

number of rows inserted
0

Now if truncate CUST and I swap to using this SQL for the inner part:

    SELECT --DISTINCT  
        parse_json(s.$1):application_num::int as application,
        s.op_type
    FROM prd_json AS s 
    qualify row_number() over (partition by application order by application desc)=1

I get three rows inserted, because the partition by application, is effectively binding to the s.application not the output application, and there are 3 different "applications" because of the other values.

The reason I wrote my code this way is your

select distinct application  
from prd_json  where op_type='I';

implies there is something called application already, in the table.. and thus it runs the chance of being used in the ROW_NUMBER statement..

Anyways, there is a large possible problem is you also have "update data" I guess U in your transaction block, that you want to ORDER BY the sub-select so you never have a Inser,Update trying action in Update,Inser order. And assuming you want all update operations if there are many of them.. I will stop. But if you do not have Updates, the sub-select should have the op_type='I' to avoid the non-insert ops making it. Out, or possible worse again, in your ROW_NUMBER pattern replacing the Intserts. Which I suspect is the underlying cause of your problem.

snowflake merge statement using golden gate json as source table

2 Answers