0
votes

Is there a way to combined (concatenate) Nested fields in a group by in Clickhouse in an AggregatedMergeTree materialied view?

Imagine that I have a table with a schema (simplified) like this:

CREATE TABLE test
(
    key1 String,
    key2 String,
    clicks Int32,
    points Nested(x Int32, y Int32)
) Engine = Log

I would like to be able to use an AggregatingMergeTree to generate a materialized view that combines the nested fields by "concatenating" them (as if nested records could be just concatenated as complex values as some SQL dialects can).

If I do that as a query it's possible:

SELECT 
    key1,
    key2,
    arrayMap(p -> p.1, points) as x,
    arrayMap(p -> p.2, points) as y
FROM
(
    SELECT  
        key1,
        key2,
        groupArray(tuple(x, y)) as points
    FROM
    (
        SELECT 
            key1, key2, points.x as x, points.y as y
        FROM test 
        ARRAY JOIN points
    )
    GROUP BY key1, key2
)

Is there a way to express this in the query used in a materialized view based on the AggregatingMergeTree engine? The best I could come up with is something like this:

CREATE MATERIALIZED VIEW testagg1
  engine = AggregatingMergeTree partition by key1 order by (key1, key2)
AS 
SELECT
    key1,
    key2,
    sumState(clicks) as clicks,
    groupArrayState(points.x) as `points.x`,
    groupArrayState(points.y) as `points.y`
FROM test
GROUP BY key1, key2

I can then get the flattened form using this query:

SELECT 
    arrayMap(p -> p.1, arrayZip(x, y)) as x1,
    arrayMap(p -> p.2, arrayZip(x, y)) as y1
FROM
(
    SELECT 
        key1,
        key2,
        groupArrayMerge(`points.x`) as x, 
        groupArrayMerge(`points.y`) as y 
    FROM testagg1
    GROUP BY key1, key2
) as points 
ARRAY JOIN x, y

It works but seems a bit complex.

  • Is there a simpler and better way to do this?
  • Are the groupArrayState and groupArrayMerge aggregation used above guaranteed to preserve the same ordering of the x/y fields in the parallel arrays?
1

1 Answers

1
votes

Nested(x Int32, y Int32) -- is a syntax sugar for a create table command to reduce arrays boilerplate.

desc test
┌─name─────┬─type─────────┬
│ key1     │ String       │
│ key2     │ String       │
│ clicks   │ Int32        │
│ points.x │ Array(Int32) │
│ points.y │ Array(Int32) │
└──────────┴──────────────┴

The best I could come up with is something like this: groupArrayState(points.x) as points.x, groupArrayState(points.y) as points.y

It's the only way. And it's the official/right CH way.

aggregation used above guaranteed to preserve the same ordering of the x/y fields in the parallel arrays?

yes, it's guaranteed.

SELECT arrayMap(p -> p.1, arrayZip(x, y)) as x1, arrayMap(p -> p.2, arrayZip(x, y)) as y1

It's the same as

SELECT x,y

isn't it?