0
votes

I need to take the distinct values from Table 2 while joining with Table 1 in Hive. Because the table 2 has duplicate records.

Considering below join condition is it possible to take only distinct key_col from table 2? i dont want to use select distinct * from ...

select * from Table_1 a left join Table_2 b on a.key_col = b.key_col

Note: This is in Hive

1

1 Answers

1
votes

Use Left semi join. This will give you all the record in table1 which exist in table2(duplicate record) without duplicates.

select a.* from Table_1 a left semi join Table_2 b on a.key_col = b.key_col