0
votes

I have two datasets: 1)

Table A 
id  name
1   raju
2   ramu
2   ramu
3   rakesh

Table A

2)

Table B 
id  Status
1   Y
1   Y
2   N
2   N
2   Y
3   N

Table B

I want to perform a left outer join (Table A Left outer join Table B) in hive in such a way that while joining the two datasets on column 'id', if the Status column in table B has 'Y' at least once, then the resultant dataset will have the Status as 'Yes' like below:

Final Result:       
id  name    Status
1   raju    Yes
2   ramu    Yes
2   ramu    Yes
3   rakesh  No

I do not want to increase the number of records in the result while performing the join. The result table should have only 4 records, not 5(increased records based on the matching join condition).

How do I achieve this?

1

1 Answers

1
votes

To meet this requirement, you need to reduce the row count of your table B to either one row or zero rows per row of table A.

Do that like this, getting one row for every id value with a status of Y.

         SELECT DISTINCT id, Status
           FROM B
          WHERE Status = 'Y'

Then you can use the old LEFT JOIN ... IS NOT NULL trick to figure out which rows of A have matching rows of B.

This does it. (http://sqlfiddle.com/#!9/71d84b/1/0)

SELECT A.id, A.name, 
       CASE WHEN B.Status IS NOT NULL THEN 'Yes' ELSE 'No' END Status
  FROM A
  LEFT JOIN (
         SELECT DISTINCT id, Status
           FROM B
          WHERE Status = 'Y'
       ) B ON A.id = B.id