1
votes

With Google BigQuery, I'm running a query with a group by and receive the error, "Resources exceeded during query execution The query could not be executed in the allotted memory. Peak usage: 152% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 99% other/unattributed: 1%".

I am using this query -

    SELECT
  CASE
    WHEN (sourceId = 1 AND web_id IS NOT NULL) THEN LAST_VALUE(Name IGNORE NULLS) OVER (PARTITION BY dgId, web_id ORDER BY event_timestamp ASC)
    WHEN (sourceId IN (2,
      4)
    AND zc IS NOT NULL) THEN coalesce(LAST_VALUE(Name IGNORE NULLS) OVER (PARTITION BY dgId, zc ORDER BY event_timestamp ASC),
    FIRST_VALUE(Name IGNORE NULLS) OVER (PARTITION BY dgId, zc ORDER BY event_timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING))
  ELSE
  Name
END
  AS Name,
  session_id,
  CASE
    WHEN (sourceId = 1 AND web_id IS NOT NULL) THEN LAST_VALUE(user_id IGNORE NULLS) OVER (PARTITION BY dgId, web_id ORDER BY event_timestamp ASC)
    WHEN (sourceId IN (2,
      4)
    AND zc IS NOT NULL) THEN coalesce(LAST_VALUE(user_id IGNORE NULLS) OVER (PARTITION BY dgId, zc ORDER BY event_timestamp ASC),
    FIRST_VALUE(user_id IGNORE NULLS) OVER (PARTITION BY dgId, zc ORDER BY event_timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING))
  ELSE
  user_id
END
  AS user_id
FROM (
  SELECT
    CASE
      WHEN (sourceId = 1 AND web_id IS NOT NULL) THEN FIRST_VALUE(consent_resolved IGNORE NULLS) OVER (PARTITION BY dgId, web_id ORDER BY event_timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)WHEN (sourceId IN (2, 4) AND zc IS NOT NULL) THEN FIRST_VALUE(consent_resolved IGNORE NULLS) OVER (PARTITION BY dgId, zc ORDER BY event_timestamp ASC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
    ELSE
    consent_resolved
  END
    AS consent_resolved,
    * EXCEPT(consent_resolved)
  FROM (
    SELECT
      CASE
        WHEN (LOWER(consent) ='no') THEN consent
      ELSE
      NULL
    END
      AS consent_resolved,
      *
    FROM
      `table_name` ))
WHERE
  consent_resolved IS NULL;

Any suggestions how do I resolve this? There are 50 million rows in my Big Query table.

1
You can try materialising few cte's as you are using multiple analytical functions which in turn are using ORDER BY clause. ORDER BY clause is memory intensive function, and mostly it is the reason for the error. - SAL

1 Answers

1
votes

Whilst I do not possess sample data to optimise your query, I will explain you important points.

There are specific operations in BigQuery that required data to exist on a single node. So, when data can not longer fit on that node, you will get “Resources exceeded during query execution” error and OVER() is one of these operations. As I can see, your query perform a significant amount of OVER() and ORDER BY, which is also an expensive resource (resource-wise).

Therefore, in order to optimise your query, you could separate your data into shards using a WITH clause. In addition, according to the documentation, there are several points which leads your query to have a better performance, such as:

  • Input data and data sources (I/O): How many bytes does your query read?

  • Communication between nodes (shuffling): How many bytes does your query pass to the next stage?

  • How many bytes does your query pass to each slot?

  • Computation: How much CPU work does your query require?

  • Outputs (materialization): How many bytes does your query write?

  • Query anti-patterns: Are your queries following SQL best practices?