4
votes

I'm trying to group data by minutes, so I tried this query:

SELECT FROM_UNIXTIME(
     unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') as ts,
     count (*) as cnt 
     from toucher group by ts limit 10;

Then hive tells me no such column,

FAILED: SemanticException [Error 10004]: Line 1:134 Invalid table alias or column reference 'ts': (possible column names are: time, ip, username, code)

So is it not supported by hive?

2
The column TS is not in scope here ! you need to write this in outer query.Ashutosh Arya

2 Answers

6
votes
SELECT FROM_UNIXTIME(unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') as ts,
     count (*) as cnt 
from toucher 
group by FROM_UNIXTIME(unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') limit 10;

or and better

 select t.ts, count(*) from
(SELECT FROM_UNIXTIME(unix_timestamp (time, 'yyyy-mm-dd hh:mm:ss'), 'yyyy-mm-dd hh:mm') as ts             
    from toucher ) t
    group by t.ts limit 10;
2
votes

As is the case with most relational database systems, the SELECT clause is processed after the GROUP BY clause. This means you cannot use columns aliased in the SELECT (such as ts in this example) in your GROUP BY.

There are essentially two ways around this. Both are correct, but some people have preference for one over the other for various reasons.

First, you could group by the original expression, rather than the alias. This results in duplicate code, as you will have the exact same expression in both your SELECT and GROUP BY clause.

SELECT 
    FROM_UNIXTIME(unix_timestamp(time,'yyyy-mm-dd hh:mm:ss'),'yyyy-mm-dd hh:mm') as ts,
    COUNT(*) as cnt 
FROM toucher 
GROUP BY FROM_UNIXTIME(unix_timestamp(time,'yyyy-mm-dd hh:mm:ss'),'yyyy-mm-dd hh:mm')
LIMIT 10;

A second approach is to wrap your expression and alias in a subquery. This means you do not have to duplicate your expression, but you will have two nested queries and this may have performance implications.

SELECT 
    ts,
    COUNT(*) as cnt 
FROM 
    (SELECT
        FROM_UNIXTIME(unix_timestamp(time,'yyyy-mm-dd hh:mm:ss'),'yyyy-mm-dd hh:mm') as ts,
     FROM toucher) x
GROUP BY x.ts
LIMIT 10;

Both should have the same result. Which you should use in this case will depend on your particular use; or perhaps personal preference.

Hope that helps.