1
votes

I've successfully used complex type (in my case map) in a GROUP BY statement. But when I'm trying do use map in PARTITION BY statement I get an error:

SELECT *, MIN(dt) OVER(PARTITION BY some_map) FROM some_table

FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: Partition Expression (TOK_TABLE_OR_COL query_params) is not a comparable expression

So does hive support complex type in PARTITION BY statement? And if doesn't, what can I do instead if I need to preserve all other column values (so I can't perform GROUP BY)?

What do I mean? For example, I have a table user_queries:

user_id|query_params      |result_position|dt
1      |{'text':'query1'} |0              |2017-04-01 10:00
1      |{'text':'query1'} |1              |2017-04-01 12:00
1      |{'text':'query2'} |0              |2017-04-01 13:00
2      |{'text':'query1'} |0              |2017-04-01 09:00

I perform this query:

SELECT
    user_id,
    query_params,
    position,
    MIN(dt) OVER(PARTITION BY user_id, query_params) AS dt
FROM user_queries

And I want to get this (look at the dt in 2-nd line):

user_id|query_params      |result_position|dt
1      |{'text':'query1'} |0              |2017-04-01 10:00
1      |{'text':'query1'} |1              |2017-04-01 10:00
1      |{'text':'query2'} |0              |2017-04-01 13:00
2      |{'text':'query1'} |0              |2017-04-01 09:00
1
Wherever posibble or not, doing that seems unreasonableDavid דודו Markovitz
@Dudu Markovitz could you explain me why?) For example, I have map with lots of fields. GROUP BY understands that two maps are equal if all corresponding equal key's values are equal. I don't want to enumerate all map's values, so isn't this reasonable?sophie.sorokina
It is similar to group by * (pseudo code). Can you show a concrete example?David דודו Markovitz
@Dudu Markovitz I've added an example to my question. Sorry, it's the first time I'm asking on stackoverflowsophie.sorokina
O.K. That actually make sense :-) This is your way to identify repeatable user queriesDavid דודו Markovitz

1 Answers

0
votes
select  user_id
       ,query_params
       ,result_position

       ,min(dt) over
        (
            partition by    user_id
                           ,sort_array (map_keys    (query_params))
                           ,sort_array (map_values  (query_params))
        ) as min_dt

from    user_queries