0
votes

I am experiencing unusual behavior with Redshift where a basic query with inner joins has just stopped working as it is expected to.

Providing the query below,

SELECT D.asin, D.client_account_id
FROM ams.t_asin_segmentation_detail A_0
       INNER JOIN ams.t_asin_segmentation_value B
                  ON A_0.ASIN_SEGMENTATION_VALUE_ID = B.ID AND A_0.IS_DELETED = 'N' AND A_0.IS_ACTIVE = 'Y' AND
                     B.IS_DELETED = 'N' AND B.IS_ACTIVE = 'Y' AND B.ID IN (900)
       INNER JOIN ams.t_asin_segmentation_type C
                  ON B.ASIN_SEGMENTATION_TYPE_ID = C.ID AND C.IS_DELETED = 'N' AND C.IS_ACTIVE = 'Y' AND
                     C.ID = 1687 AND C.BUSINESS_UNIT_ID = 15
       INNER JOIN ams.t_asin D ON A_0.asin_id = D.ID AND D.is_active = 'Y' and D.is_deleted = 'N';    

This Query is returning just a single row which is incorrect. It is supposed to return more than a 1000 rows.

When I write the same query differently, just removing the last JOIN with table aliased D and writing it this way,

SELECT asin, client_account_id
FROM ams.t_asin
WHERE id IN (SELECT DISTINCT A_0.asin_id
             FROM ams.t_asin_segmentation_detail A_0
                    INNER JOIN ams.t_asin_segmentation_value B
                               ON A_0.ASIN_SEGMENTATION_VALUE_ID = B.ID AND A_0.IS_DELETED = 'N' AND
                                  A_0.IS_ACTIVE = 'Y' AND
                                  B.IS_DELETED = 'N' AND B.IS_ACTIVE = 'Y' AND B.ID IN (900)
                    INNER JOIN ams.t_asin_segmentation_type C
                               ON B.ASIN_SEGMENTATION_TYPE_ID = C.ID AND C.IS_DELETED = 'N' AND C.IS_ACTIVE = 'Y' AND
                                  C.ID = 1687 AND C.BUSINESS_UNIT_ID = 15)
  AND is_active = 'Y'
  AND is_deleted = 'N';

This starts fetching the correct number of rows without any logical change in the Query itself.

I further tweaked with the original query and just replaced the INNER JOIN with table aliased D to a LEFT JOIN and surprisingly it started giving the correct number of rows. It is important to note that even with a LEFT JOIN, the SELECT statement is still fetching columns from table aliased D which as shown below,

SELECT D.asin, D.client_account_id
FROM ams.t_asin_segmentation_detail A_0
       INNER JOIN ams.t_asin_segmentation_value B
                  ON A_0.ASIN_SEGMENTATION_VALUE_ID = B.ID AND A_0.IS_DELETED = 'N' AND A_0.IS_ACTIVE = 'Y' AND
                     B.IS_DELETED = 'N' AND B.IS_ACTIVE = 'Y' AND B.ID IN (900)
       INNER JOIN ams.t_asin_segmentation_type C
                  ON B.ASIN_SEGMENTATION_TYPE_ID = C.ID AND C.IS_DELETED = 'N' AND C.IS_ACTIVE = 'Y' AND
                     C.ID = 1687 AND C.BUSINESS_UNIT_ID = 15
       LEFT JOIN ams.t_asin D ON A_0.asin_id = D.ID AND D.is_active = 'Y' and D.is_deleted = 'N';

Would appreciate if anyone can provide any reason for this to happen.

2
You only have one row where A_0.asin_id = D.ID AND D.is_active = 'Y' and D.is_deleted = 'N' And those D.ID, D.is_Active, D.is_deleted into your SELECT clause and see what's up. - JNevill
@JNevill There are 1023 rows to be precise that qualify the join condition not one row. That can be established by the second query where I still have the same conditions and just using WHERE IN instead of INNER JOIN. - Aneeb Khawar

2 Answers

0
votes

Your queries are nothing similar. Your first query is saying that all matching rows have the same values of D.asin and D.client_account_id. That is why you are getting one row.

The second query has no distinct in the outer query, so it will fetch all those rows.

I'm not sure why you would think that these are logically equivalent.

0
votes

Got this resolved by changing the encoding type on Sort Keys from ZSTD to RAW. Still not entirely sure how that could have led to change in behavior of INNER JOIN.

The Sort Keys in these queries are all the ID columns being used in joins.

Both queries (INNER JOIN and WHERE IN) returning the same rows after this change.