10
votes

I try to write Hive Sql like that

SELECT count(1), substr(date, 1, 4) as year
FROM ***
GROUP BY year

But Hive cannot recognize the alias name 'year', it complains that: FAILED: SemanticException [Error 10004]: Line 1:79 Invalid table alias or column reference 'year'

One solution(Hive: SELECT AS and GROUP BY) suggest to use 'GROUP BY substr(date, 1, 4)'.

It works! However in some cases the value I want to group by may be generated from multiple lines of hive function code, it's very ugly to write code like

SELECT count(1), func1(func2(..........................)) AS something
FROM ***
GROUP BY func1(func2(..........................))

Is there any clean way in Hive to do that? Any suggestions?

3

3 Answers

24
votes

Specifying the position in Group By will solve your issue. This position number in Group By works even when SET hive.groupby.orderby.position.alias=false; (Hive 0.12)

SELECT count(1), substr(date, 1, 4) as year  
FROM ***
GROUP BY 2;
7
votes

In Hive 0.11.0 and later, columns can be specified by position if hive.groupby.orderby.position.alias is set to true (the default is false). So setting set hive.groupby.orderby.position.alias=true; in your .hql (or .hiverc for a permanent solution) will do the trick and then you can type group by 2 for the above example. Source: hive language manual

0
votes

One solution that comes to mind is put GROUP BY to outer query:

SELECT count(*) , year FROM 
(
   SELECT substr(date, 1, 4) as year FORM ***
) inner
GROUP BY year

GL!