0
votes

We have six hive tables with sample (example) structure like (where each table has millions of merchant records)

Table1
MerchntId ,field1, field2

Table2
MerchantId, field3,field4

Table3
MerchantId, field5,field6,field7

Table4
MerchantId, field8,field9,field10

Table5
MerchantId, field11, field12, field13

and so on

Requirement is to create a horizantal layout to take all unique merchants where at least one field has value for a merchantId A merchantId may or may not present in other tables.(for a merchant there may be records in other tables or may not be there)

Final Table MerchntId, field1, field2, field3, field4,
field5,field6,field7,field8,field9,field10,field11, field12, field13

output should be like after joining

i) 101 abc def                         ghi
ii) 102 ghj fert hyu ioj khhh jjh ddd aas fff kkk fff vvv ff 

for case (i) only three fields have values for case (ii) all fields have values

For this we are doing FULL OUTER JOIN on merchantId for two tables and so on and then creating the final table

Is there any better approach doing this ?

for eg. my current approach

SELECT distinct
(case when a.MerchntId IS NOT NULL then a.MerchntId else (case when      
b.MerchntId IS NOT NULL
then b.MerchntId else '' end ) end ) as MerchntId,
(case when a.field1 IS NOT NULL then a.field1 else '' end ) as field1,
(case when a.field2 IS NOT NULL then a.field2 else '' end ) as field2,
(case when b.field3 IS NOT NULL then b.field3 else '' end ) as field3,
(case when b.field4 IS NOT NULL then b.field4 else '' end ) as field4
from Table1 a
full outer join Table2 b
ON a.MerchntId = c.MerchntId;

full outer join of table 3 and table 4 and then full outer join of these two tables to create a final table

2

2 Answers

0
votes

I don't see any other option since your requirements explicitly translate to a full outer join. However, your query can be improved by using COALESCE and NVL:

SELECT
COALESCE(a.MerchntId, b.MerchntId) as MerchntId,
NVL(a.field1, '') as field1,
NVL(a.field2, '') as field2,
NVL(b.field3, '') as field3,
NVL(b.field4, '') as field4 
from Table1 a
full outer join Table2 b
ON a.MerchntId = c.MerchntId;

Also, I'm not sure why you use distinct in your query.

0
votes

Union all 6 table, substituting missed fields with nulls. Then Aggregate by MerchantId using min or max:

select MerchantId, max(field1) field1, max(field2) field2...max(field13) field13 from
(
select MerchntId field1, field2, null field3, null field4... null field13  from Table1 
union all
select MerchntId null field1, null field2, field3, field4... null field13  from Table2
union all
...

select MerchantId, null field1, null field2... field11, field12, field13
from table6
)s group by MerchantId

After this you can apply your logic with replacing nulls with '' if necessary