I'm trying to do a full otuer join between 5 tables. I am trying to do it using an "anchor table" and then performing a left outer join. Select all distinct key fields from each table then from those do left outer joins on each of the 5 tables. Doing this with HIVE. Here is what I have...
select
COALESCE(a.key,b.key,c.key,d.key,e.key,NULL) key,
a.field_1,
b.field_2,
c.field_3,
d.field_4,
e.field_5
from (
select distinct key from table_1
union
select distinct key from table_2
union
select distinct key from table_3
union
select distinct key from table_4
union
select distinct key from table_5
) as keyvalues
LEFT OUTER JOIN table_1 as a on a.key = keyvalues.key
LEFT OUTER JOIN table_2 as b on b.key = keyvalues.key
LEFT OUTER JOIN table_3 as c on c.key = keyvalues.key
LEFT OUTER JOIN table_4 as d on d.key = keyvalues.key
LEFT OUTER JOIN table_5 as e on e.key = keyvalues.key
The above code is generic,I have many more fields than that but that is the idea. What my results are showing are duplicate key values. That's because key is not necessarily a primary key across those tables. So my result i would like to have no duplicates. Almost if I could take a distinct of the "keyvalues" variable before I start doing those left outer joins.
coalesce keypart can just be keyvalues.key. - avery_larry