1
votes

For one of our code, left join is not behaving properly in snowflake. Require your help if you can find solution around the same.

We have a sample data setup like mentioned below with basicc table join.

CREATE TABLE patient_test(pid INT);
INSERT INTO patient_test (pid) VALUES (100);

CREATE TABLE pateint_entry_test (pid INT,DateAdded DATETIME);
INSERT INTO pateint_entry_test (pid, DateAdded) VALUES (100, '2020-07-13');

Now look below code where I am just giveing you sample sub query that we are using with other query set. Where our motivation was to get date entry for each patient based on given start/end date.


WITh patient_cte  AS(
          SELECT * FROM patient_test
      )
      ,
      dates AS(
       SELECT  DATEDIFF(day, CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)),
                            CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ))) AS Total_Days,
                            CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)) AS Start_Date,
                            CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ)) AS end_date
      )
      ,
      cte2 (date) as (
      SELECT TO_DATE(START_DATE) FROM dates
      UNION ALL
      SELECT TO_DATE(DATEADD(day, 1, date)) FROM cte2 WHERE date < (SELECT TOP 1 END_DATE FROM dates)
      ),
      cte3 AS (
          select * from patient_cte
              cross join cte2
      )

      SELECT cte3.pid as p_pid,
        pateint_entry_test.pid as p_entry_pid,
        pateint_entry_test.DateAdded,
        cte3."DATE" ,
        IFNULL( pateint_entry_test.DateAdded, cte3."DATE") AS CALCULATEDDATEMEASURED
     FROM cte3
        LEFT JOIN pateint_entry_test ON
            cte3.pid = pateint_entry_test.pid AND
            cte3."DATE" = TO_DATE(pateint_entry_test.DateAdded)

Output of the query gives result as below. enter image description here

Where you can see CALCULATEDDATEMEASURED for Row number 2 to 7 are coming as 2020-07-06 00:00:00. But as DAETADDED is coming for null then it should come proper date based on DATE column value ( Based on this condition IFNULL( pateint_entry_test.DateAdded, cte3."DATE"))

Expecting below output from the query enter image description here

Not sure what is wrong, but its not behaving as expected. Aprreciate your help on this. Thank you.

1
This looks like a bug that you should report to Snowflake. I am curious why you are using the non-standard IFNULL() rather than the standard COALESCE(). I doubt that makes a difference to your results, however.Gordon Linoff
Thanks @GordonLinoff for reverting back, tried coalesce() earlier but it was behaving same way.rajusem
Should this statement have an order by date? (SELECT TOP 1 END_DATE FROM dates) This is difficult to reproduce or help debug with example data for the tables that you are referencing in your example code.Mike Walton
Thanks @MikeWalton for reverting, its a working sample query i have added here for easy debugging. You can run the same codebase in snowflake environment as well. Tried to add order by like you suggested but its not helping.rajusem
Please use text, not images/links, for text--including tables & ERDs. Use images only for what cannot be expressed as text or to augment text. Include a legend/key & explanation with an image.philipxy

1 Answers

0
votes

I'm not sure if this is a bug, but it's due to type coercion based on the way you wrote your query. Here's your query with the TO_DATE logic applied in the IFNULL statement the same as it is in the join (along with a COALESCE to show that it produces the same result as IFNULL):

WITh patient_cte  AS(
          SELECT * FROM patient_test
      )
      ,
      dates AS(
       SELECT  DATEDIFF(day, CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)),
                            CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ))) AS Total_Days,
                            CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)) AS Start_Date,
                            CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ)) AS end_date
      )
      ,
      cte2 (date) as (
      SELECT TO_DATE(START_DATE) FROM dates
      UNION ALL
      SELECT TO_DATE(DATEADD(day, 1, date)) FROM cte2 WHERE date < (SELECT TOP 1 END_DATE FROM dates)
      ),
      cte3 AS (
          select * from patient_cte 
              cross join cte2 
      )
      SELECT cte3.pid as p_pid,
        pateint_entry_test.pid as p_entry_pid,
        pateint_entry_test.DateAdded,
        cte3."DATE",
        IFNULL( pateint_entry_test.DateAdded, cte3."DATE") AS ORIGINAL_ERROR,
        IFNULL( to_date(pateint_entry_test.DateAdded), cte3."DATE") AS CALCULATEDDATEMEASURED,
        coalesce(to_date(pateint_entry_test.DateAdded), cte3."DATE") as from_coalesce
     FROM cte3 
        LEFT JOIN pateint_entry_test 
            ON cte3.pid = pateint_entry_test.pid 
            AND cte3."DATE" = to_date(pateint_entry_test.DateAdded);

Running this in Snowflake produces this: enter image description here