0
votes

ex:Input

ID   Col1   Col2   Col3
--   ----   ----   ----
1       a     a     sql  
2       a     a    hive

Out put

ID   Col1   Col2   Col3
--   ----   ----   ----
1       a     a     sql  

Here my id value and Col3 values are unique but i need to filter on min id and populate all records.

I know below approach will work, but any best approach other than this please suggest

select Col1,Col2,min(ID) from table group by Col1,Col2;

and join this on ID,Col1,Col2

2

2 Answers

1
votes

I think you want row_number():

select t.*
from (select t.*, row_number() over (partition by col1, col2 order by id) as seqnum
      from t
     ) t
where seqnum = 1
1
votes

It appears that Hive supports ROW_NUMBER. Though I’ve never used hive, other rdbms would use it like this to get the entire contents of the min row without needing to join (doesn’t suffer problems if there are repeated minimum values)

SELECT a.* FROM 
(
  SELECT *, ROW_NUMBER() OVER(ORDER BY id) rn FROM yourtable
) a
WHERE a.rn = 1

The inner query selects all the table data and establishes an incrementing counter in order of ID. It could be based on any column, the min ID (in this case) being row number 1. If you wanted the max, order by ID desc
If you want the number to restart for different values of another column (eg of ten of your Col3 were “sql” and twenty rows had “hive”) you an say PARTITION BY col3 ORDER BY id, and the row number will be a counter that increments for identical values of col3, restarting from 1 for each distinct value of col3