0
votes
select season,violation_code, cnt, 
       RANK() over (Partition BY season order by cnt desc) AS rank 
  from
    (  select season,violation_code, 
              count(*) as cnt 
         from  ParkingViolations_seondary 
     group by season,violation_code
    ) tmp
where rank <= 3

I'm new to Hive. Can somebody help me what is wrong with the above query? It throws the following error:

Error while compiling statement:

FAILED: SemanticException [Error 10004]: line 4:6 Invalid table alias or column reference 'rank': (possible column names are: season, violation_code, cnt)

Any quick help would be appreciated.

2

2 Answers

0
votes

Use subquery to be able to address rank in the where clause:

select season, violation_code, cnt, rnk
from
( select season,violation_code, cnt, 
         RANK() over (Partition BY season order by cnt desc) AS rnk
    from
        (  select season,violation_code, 
                count(*) as cnt 
             from  ParkingViolations_seondary 
         group by season,violation_code
        ) tmp
)s
where rnk <= 3
0
votes

Yes i was also able to get it working with the following:

SELECT * FROM ( SELECT season,violation_code, cnt, RANK() over (Partition BY season ORDER BY cnt DESC) AS frequency FROM (SELECT season,violation_code, COUNT(*) as cnt FROM ParkingViolations_seondary WHERE (violation_code <> 0) and (street_code1 <> 0 or street_code2 <> 0 or street_code3 <> 0) GROUP BY season,violation_code)TMP )TMP1 WHERE frequency <= 3;