1
votes

My problem statement is like

"Find top 2 districts per state with the highest population"

data is like

Input

My expected output is

output

I tried this with lot of queries and sub-queries but results in SQL error with the sub query

Can anyone help me with getting this result?

Thanks in advance.

Queries I tried

  1. Select state_name, (select concat_ws(',', collect_set(dist_name as string)) from population where state_name = state_name group by state order by population desc 2)

from population group by state_name

  1. select
    state_name, concat_ws(',', collect_set(cast(dist_name as string)))
    from population where population.dist_name in (select dist_name from ( select dist_name , max(b.population) as total from population b where state_name = b.state_name group by b.dist_name , b.dist_name order by total desc limit 2) as dist_name ) group by state_name
1
It is better to post the text here, than in images. Also you must show us what you have done so far. The queries that you have tried. - alex.pulver
Can you post what queries did you try so far? - Nishu Tayal

1 Answers

0
votes

Below would be the query -

 select A.state, collect_set(A.dist)[0], collect_set(A.dist)[1] from 
(select state, dist, row_number() over (partition by state order by population 
 desc) as rnk from <tableName>) A
where A.rnk<=2 group by A.state;

Below are the results for sample data -

hive> select * from hier;
OK
C1      C11
C11     C12
C12     123
P1      C1
P2      C2

hive> select parent, collect_set(child)[0], collect_set(child)[1] from hier group by parent;
OK
C1      C11     NULL
C11     C12     NULL
C12     123     NULL
P1      C1      NULL
P2      C2      NULL
Time taken: 19.212 seconds, Fetched: 5 row(s)