5
votes

I have a UDF (GetUrlExt) that returns extension. (ex: jpg in /abc/models/xyz/images/top.jpg). The data is like shown below:

Date Time TimeTaken uristem  
9/5/2011 0:00:10 234 /abc/models/xyz/images/top.jpg  
9/5/2011 0:00:11 456 /abc/models/xyz/images/bottom.jpg  
9/5/2011 0:00:14 789 /abc/models/xyz/images/left.gif  
9/5/2011 0:00:16 234 /abc/models/xyz/images/top.pdf  
9/5/2011 0:00:18 734 /abc/models/xyz/images/top.pdf  
9/5/2011 0:00:19 654 /abc/models/xyz/images/right.gif  
9/5/2011 0:00:21 346 /abc/models/xyz/images/top.pdf  
9/5/2011 0:00:24 556 /abc/models/xyz/images/front.pdf  
9/5/2011 0:00:26 134 /abc/models/xyz/images/back.jpg

The query without 'GROUP BY' is working fine:

SELECT GetUrlExt(uristem) AS extn FROM LogTable; 

Result: jpg jpg gif pdf pdf gif pdf pdf jpg

Now I need 'GROUP BY' on the results of the GetUrlExt UDF.
Expected Result:
jpg 3 274.6
gif 2 721.5
pdf 4 467.5

But the following query is not working:

SELECT GetUrlExt(uristem) AS extn, Count(*) AS PerCount, Avg(TimeTaken) AS AvgTime FROM LogTable GROUP BY extn;

Any kind of help is appreciated!

2

2 Answers

7
votes

Pls use subquery to group by.

Hive doesn't support group by calculated value directly.

SELECT a.extn, Count(*) AS PerCount, Avg(TimeTaken) AS AvgTime 
FROM
(
    SELECT GetUrlExt(uristem) AS extn, TimeTaken
    FROM LogTable 
) a
GROUP BY a.extn;
1
votes

You can either enable group by alias, or you need the whole statement in group by

SELECT GetUrlExt(uristem) AS extn, Count(*) AS PerCount, Avg(TimeTaken) AS AvgTime 
FROM LogTable 
GROUP BY GetUrlExt(uristem);