0
votes

I am trying to union a bunch of tables and some of these fields are of type array of bigint. When I try to cast these in Hive, it keeps giving me a "Expression not in GROUP BY key 'field_name'.

My code is as follows:

    CREATE TABLE sample_table stored AS avro AS 
    SELECT cast(cast(rand()* 1000000000000000000 AS BIGINT) AS string)        id, 
           'PERSON'AS person, 
           name AS name, 
           age AS age, 
           gender AS gender
FROM   table_a
UNION ALL 
SELECT cast(cast(rand()* 1000000000000000000 AS BIGINT) AS string)    id,  
          'PERSON'AS person, 
           name AS name, 
           collect_list(
    CAST(NULL AS BIGINT)
  ) AS age, 
           null AS gender
FROM   table_b

The error being generated is as follows:

SQL Error [500051] [HY000]: [Cloudera]HiveJDBCDriver ERROR processing query/statement. Error Code: 10025, SQL state: TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Error while compiling statement: FAILED: SemanticException [Error 10025]: Line 4:7 Expression not in GROUP BY key 'age':28:27, org.apache.hive.service.cli.operation.Operation:toSQLException:Operation.java:400, org.apache.hive.service.cli.operation.SQLOperation:prepare:SQLOperation.java:187

1
i assume this SQL code isn't complete as no grouping SQL keyword is used here like GROUP BY/DISTINCT where the error (Expression not in GROUP BY key ) is based on?Raymond Nijland
From my understanding I think the group by is being triggered via the underlying tables being called but I'm not sure.. This is the complete code as I'm running it in my envravecoder
" This is the complete code as I'm running it in my env " Check if those tables are views then maybe?Raymond Nijland
That's the first thing I checked. Looks like they are all good. They are being grouped by the desired fields.. Any other thoughts?ravecoder

1 Answers

1
votes

collect_list is aggregation function. You need group by every non-constant field.

SELECT cast(cast(rand()* 1000000000000000000 AS BIGINT) AS string)    id,  
          'PERSON'AS person, 
           name, 
           collect_list(CAST(NULL AS BIGINT)) AS age, 
           null AS gender
FROM   table_b
group by cast(cast(rand()* 1000000000000000000 AS BIGINT) AS string), name