1
votes

I have a table with data at hour level. I want to find the count of hours and the values for col1 and col2 for all hours in an array. Input Table

+-----+-----+-----+
| hour| col1| col2|
+-----+-----+-----+
| 00  | 0.0 | a   |
| 04  | 0.1 | b   |
| 08  | 0.2 | c   |
| 12  | 0.0 | d   |
+-----+-----+-----+

I am using the below query to get the column values in an array

Query: select count(hr), map_values(str_to_map(concat_ws(',',collect_set(concat_ws(':',reflect('java.util.UUID','randomUUID'),cast(col1 as string)))))) as col1_arr, map_values(str_to_map(concat_ws(',',collect_set(concat_ws(':',reflect('java.util.UUID','randomUUID'),cast(col2 as string)))))) as col2_arr from table;

Output that i am getting, values in col2_arr are not in the same sequence with col1_arr. Please suggest how can i get the values in array/list for different columns in same sequence.

+----------+-----------------+----------+
| count(hr)| col1_arr        | col2_arr | 
+----------+-----------------+----------+
| 4        | 0.0,0.1,0.2,0.0 | b,a,c,d  | 
+----------+----------------+-----------+

Required output:

+----------+-----------------+----------+
| count(hr)| col1_arr        | col2_arr | 
+----------+-----------------+----------+
| 4        | 0.0,0.1,0.2,0.0 | a,b,c,d  | 
+----------+----------------+-----------+

Thanks

1

1 Answers

0
votes
select  count(*) as cnt 
       ,concat_ws(',',sort_array(collect_list(hour)))  as hour
       ,regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',hour,cast(col1 as string))))),'..:','') as col1
       ,regexp_replace(concat_ws(',',sort_array(collect_list(concat_ws(':',hour,col2)))),'..:','') as col2

from    mytable
;

+-----+-------------+-------------+---------+
| cnt |    hour     |    col1     |  col2   |
+-----+-------------+-------------+---------+
|   4 | 00,04,08,12 | 0,0.1,0.2,0 | a,b,c,d |
+-----+-------------+-------------+---------+