0
votes

I am having the following Snowflake statement which will check if hashed fields coming from a stage file already exists in the target table and then do an insert when not matched:

MERGE INTO LINK_DIMENSION_FIELD AS D 
USING (SELECT md5(concat(T.$2, T.$4)) DIM_FIELD, T.$2 DIMENSION_NAME, T.$4 FIELD_NAME
       FROM @ingest_stage_temp/dimension_field.csv (FILE_FORMAT=>"GENERIC_CSV_FORMAT") T) ST 
ON md5(concat(D.DIMENSION_NAME_HASH_KEY, D.FIELD_NAME_HASH_KEY)) = ST.DIM_FIELD
WHEN NOT MATCHED THEN 
INSERT (DIMENSION_NAME_FIELD_NAME_HASH_KEY, LOAD_DT, RECORD_SRC, DIMENSION_NAME_HASH_KEY, FIELD_NAME_HASH_KEY) 
VALUES(MD5(CONCAT(ST.DIMENSION_NAME, ST.FIELD_NAME)), current_timestamp(), 'TEST', md5(ST.DIMENSION_NAME), md5(ST.FIELD_NAME));

The problem is the merge is always working even when md5(concat(D.DIMENSION_NAME_HASH_KEY, D.FIELD_NAME_HASH_KEY)) = ST.DIM_FIELD.

If you can see, this is the staged file after running the select query:

SELECT md5(concat(T.$2, T.$4)) DIM_FIELD, T.$2 DIMENSION_NAME, T.$4 FIELD_NAME
FROM @ingest_stage_temp/dimension_field.csv (FILE_FORMAT=>"GENERIC_CSV_FORMAT") T

The result is:

DIM_FIELD                           DIMENSION_NAME                  FIELD_NAME
87d7dae13cf0326fd03a348ca6c518b5    cg_child_6mo_receiv_ind_iycf    cg_child_6mo_receiv_ind_iycf/nbr_1st_cons_6mc_iycfc
2b75306f968f11b45f066efb9871babb    cg_child_6mo_receiv_ind_iycf    cg_child_6mo_receiv_ind_iycf/nbr_followup_2nd_time_6mc_iycfc
53273e7133d7a0b513af8c9bcc934437    preg_women_rec_ind_counselling  preg_women_rec_ind_counselling/nbr_1st_cons_pregw_iycfc

When I run a select query on the existing data:

select * from LINK_DIMENSION_FIELD;

You can clearly see the all values within DIM_FIELD are already in this table so the insert queries should not be done:

enter image description here

1

1 Answers

1
votes

In your ON-Clause you are comparing:

md5(concat(D.DIMENSION_NAME_HASH_KEY, D.FIELD_NAME_HASH_KEY)) = ST.DIM_FIELD

I think comparing ST.DIM_FIELD to DIMENSION_NAME_FIELD_NAME_HASH_KEY (the finally calculated column in your target table) can do the trick.