0
votes

I'm trying to run this query

SELECT
  id AS id,
  ARRAY_AGG(DISTINCT users_ids) AS users_ids,
  MAX(date) AS date
FROM
  users,
  UNNEST(users_ids) AS users_ids
WHERE
  users_ids != " 1111"
  AND users_ids != " 2222"
GROUP BY
  id;

Where users table is splitted table with id column and user_ids (comma separated) column and date column on a +4TB and it give me resources

Resources exceeded during query execution: Your project or organization exceeded the maximum disk and memory limit available for shuffle operations.

.. any idea why?

id    userids   date
1     2,3,4     1-10-20
2     4,5,6     1-10-20   
1     7,8,4     2-10-20

so the final result I'm trying to reach

id    userids       date
1     2,3,4,7,8     2-10-20
2     4,5,6         1-10-20 

Execution details: enter image description here enter image description here

1
are you under an organization? is there a slot limit applied by your manager? - Pentium10
@Pentium10 There is no slot limit in the organization - memok
Can you post details on your query plan? See cloud.google.com/bigquery/query-plan-explanation - Martin Weitzmann
@Martin Weitzmann Check the description now please - memok

1 Answers

1
votes

It's constantly repartitioning - I would guess that you're trying to cramp too much stuff into the aggregation part. Just remove the aggregation part - I don't even think you have to cross join here.

Use a subquery instead of this cross join + aggregation combo.

Edit: just realized that you want to aggregate the arrays but with distinct values

WITH t AS (
  SELECT
    id AS id,
    ARRAY_CONCAT_AGG(ARRAY(SELECT DISTINCT uids FROM UNNEST(user_ids) as uids WHERE
      uids != " 1111" AND uids != " 2222")) AS users_ids,
    MAX(date) OVER (partition by id) AS date
  FROM
    users
  GROUP BY id
)

SELECT 
  id, 
  ARRAY(SELECT DISTINCT * FROM UNNEST(user_ids)) as user_ids
  ,date

FROM t

Just the draft I assume id is unique but it should be something along those lines? Grouping by arrays is not possible ...

array_concat_agg() has no distinct so it comes in a second step.