Snowflake merge into is adding data even when condition is met and even if fields from target and source tables are already exists

Question

I am having the following Snowflake statement which will check if hashed fields coming from a stage file already exists in the target table and then do an insert when not matched:

MERGE INTO LINK_DIMENSION_FIELD AS D 
USING (SELECT md5(concat(T.$2, T.$4)) DIM_FIELD, T.$2 DIMENSION_NAME, T.$4 FIELD_NAME
       FROM @ingest_stage_temp/dimension_field.csv (FILE_FORMAT=>"GENERIC_CSV_FORMAT") T) ST 
ON md5(concat(D.DIMENSION_NAME_HASH_KEY, D.FIELD_NAME_HASH_KEY)) = ST.DIM_FIELD
WHEN NOT MATCHED THEN 
INSERT (DIMENSION_NAME_FIELD_NAME_HASH_KEY, LOAD_DT, RECORD_SRC, DIMENSION_NAME_HASH_KEY, FIELD_NAME_HASH_KEY) 
VALUES(MD5(CONCAT(ST.DIMENSION_NAME, ST.FIELD_NAME)), current_timestamp(), 'TEST', md5(ST.DIMENSION_NAME), md5(ST.FIELD_NAME));

The problem is the merge is always working even when md5(concat(D.DIMENSION_NAME_HASH_KEY, D.FIELD_NAME_HASH_KEY)) = ST.DIM_FIELD.

If you can see, this is the staged file after running the select query:

SELECT md5(concat(T.$2, T.$4)) DIM_FIELD, T.$2 DIMENSION_NAME, T.$4 FIELD_NAME
FROM @ingest_stage_temp/dimension_field.csv (FILE_FORMAT=>"GENERIC_CSV_FORMAT") T

The result is:

DIM_FIELD                           DIMENSION_NAME                  FIELD_NAME
87d7dae13cf0326fd03a348ca6c518b5    cg_child_6mo_receiv_ind_iycf    cg_child_6mo_receiv_ind_iycf/nbr_1st_cons_6mc_iycfc
2b75306f968f11b45f066efb9871babb    cg_child_6mo_receiv_ind_iycf    cg_child_6mo_receiv_ind_iycf/nbr_followup_2nd_time_6mc_iycfc
53273e7133d7a0b513af8c9bcc934437    preg_women_rec_ind_counselling  preg_women_rec_ind_counselling/nbr_1st_cons_pregw_iycfc

When I run a select query on the existing data:

select * from LINK_DIMENSION_FIELD;

You can clearly see the all values within DIM_FIELD are already in this table so the insert queries should not be done:

Marcel Marcel · Accepted Answer · 2021-01-14T10:53:27

In your ON-Clause you are comparing:

md5(concat(D.DIMENSION_NAME_HASH_KEY, D.FIELD_NAME_HASH_KEY)) = ST.DIM_FIELD

I think comparing ST.DIM_FIELD to DIMENSION_NAME_FIELD_NAME_HASH_KEY (the finally calculated column in your target table) can do the trick.

Snowflake merge into is adding data even when condition is met and even if fields from target and source tables are already exists

1 Answers