0
votes

I am just trying to understand the concept behind joining of 2 tables with an OR condition. My requirement is: I need to join 2 tables Table1 [colA, colB] and Table2 [colX, colY] on columns Table1.colA = Table2.colB but if colA is NULL the condition should be Table1.colB = Table2.colY.

Do I need to do join them separately and then do union? Or is there a way I can do it in one join? Note that I have millions of records in both tables and its a left join and the tables reside in HIVE. I don't have a reproducible example, just trying to understand the concept.

2
You might be able to use a couple case statements, for example your join expression could be on case when IsNull(colA) then colB else colA end = case when IsNull(colA) then colY else colX end this may cause performance issues though since like you said, you have millions of records.wbrugato

2 Answers

2
votes

While I'm not familiar with HiveQL, in SQL server this could be accomplished as follows:

SELECT *
FROM   table1 t1
       JOIN table2 t2
         ON COALESCE(t1.cola, t1.colb) = CASE
                                           WHEN t1.cola IS NULL THEN t2.coly
                                           ELSE t2.colx
                                         END  

The logic should be fairly readable.

0
votes

Translate your conditions directly:

SELECT *
FROM table1 t1 JOIN
     table2 t2
     ON (t1.cola = t2.colb) or
        (t1.cola is null and t1.colb = t2.coly)

Usually, or is a performance killer in joins. This wold often be expressed using two separate left joins:

SELECT . . . , COALESCE(t2a.col, t2b.col) as col
FROM table1 t1 LEFT JOIN
     table2 t2a
     ON (t1.cola = t2.colb) LEFT JOIN
     table2 t2b
     ON t1.cola is null and t1.colb = t2.coly;