2
votes

I have the following hive table

ID, class,value

1, A, 0.3

1, B, 0.4

1, C, 0.5

2, B, 0.1

2, C, 0.2

I want to get

ID, class:value

1, [A:0.3, B:0.4, C:0.5]

2, [B:0.1, C:0.2]


I know that there is a collect_set() UDAF that produces a list of class or list of value, is there anyway to get a list of key:value pairs?

NOTE: I guess I can use two collect_set() one for class column and one for value column but I am not sure if the lists will be in the same order.

2
There is a possible collect() in brickhouse an open source project with useful UDFs. brickhouseconfessions.wordpress.com/2013/02/21/… - viper

2 Answers

3
votes

I've used the UnionUDAF from the Brickhouse library to do something similar. You create a map from each pair, and then union them all together during the aggregation.

Add JAR brickhouse.jar;
create temporary function BH_union as 'brickhouse.udf.collect.UnionUDAF';

SELECT S.ID, BH_union(S.v_map) 
FROM (SELECT ID, map(class, value) as v_map from mytable) S
GROUP by S.ID
0
votes

You can use custom Map/Reduce scripts and collect_list() (from Hive 0.13.0) to achieve the same.

Let me know if you need more help in this.