0
votes

My query look like this

 SELECT
    DISTINCT id,
    STRING_AGG(DISTINCT column1) AS mobile,
    STRING_AGG(DISTINCT country) AS country,
    STRING_AGG(DISTINCT language) AS language,
    STRING_AGG(DISTINCT address) AS address,
    STRING_AGG(DISTINCT model) AS model,
    STRING_AGG(DISTINCT car) AS car,
    STRING_AGG(DISTINCT class) AS class,
    home_email,
    buisness_email,
    MAX(timestamp) AS timestamp
  FROM
    user
  GROUP BY
    id,
    home_email,
    buisness_email

When I run this query on my 2TB table in bigquery and set query settings to export the output to a table it produces an error Note: when i run it on 500 GB data size it works fine

Resources exceeded during query execution: Your project or organization exceeded the maximum disk and memory limit available for shuffle operations. Consider provisioning more slots, reducing query concurrency, or using more efficient logic in this job.

So How can I solve this please?

Also after this I will need to run the same query on a this table union with another 1TB table

1

1 Answers

1
votes

I can perform a proposition (but I don't know if it works, I don't have a big enough dataset. So, if not, i will delete this "answer-try")

The idea is to split the agg into different subqueries. and then to merge all these sub parts.

WITH agg_mobile AS (
 SELECT
    DISTINCT id,
    STRING_AGG(DISTINCT column1) AS mobile,
    home_email,
    buisness_email,
    MAX(timestamp) AS timestamp
  FROM
    user
  GROUP BY
    id,
    home_email,
    buisness_email
),
agg_country AS (
 SELECT
    DISTINCT id,
    STRING_AGG(DISTINCT country) AS country,
  FROM
    user
  GROUP BY
    id,
    home_email,
    buisness_email
)
....
SELECT agg_mobile.*,
       agg_country.country, 
       ....
FROM agg_mobile 
  LEFT JOIN agg_country ON agg_mobile.id = agg_country.id 
  LEFT JOIN .....