0
votes

In BigQuery, the following error appears after execution of a scheduled query:

Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex

I admit the query is quite complex, with multiple OVER () clauses including PARTITION BY and ORDER BY in the OVER() clauses, which are expensive from a computational perspective. However this is needed to accomplish the desired result. I need this OVER() clauses to get the desired resulting table. The query is appr 50GB.

The scheduled query queries data over 4 days of Google Analytics-related data.

However, remarkably, when I'm running the same query on a manual basis, the query executes without any problems (appr 35 seconds query time). Even when I manually execute the query with 365 days of GA-data, the query executes successfully. This query is 4TB (appr 280 seconds query time).

Does anyone know why scheduled queries fail in my case while manual queries can be executed without errors? And - given the fact that the scheduling is important - does anyone know if there is a fix so that the scheduled query can be executed without errors?

Basically, it's this query, see below. Note that I hided the input table so reduced the query length a bit. The input table is just a collection of SELECT queries to merge multiple input tables using UNION ALL.

Note as well that I am trying to connect hits from separate sources, Firebase Analytics (app data) and Universal Analytics (web data), into custom session id's where this is needed, and if this is not needed use the regular visit id's from GA.

SELECT
  *,
  MAX(device) OVER (PARTITION BY country, date, custvisitid_unified RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS device_new,
IF
  (mix_app_web_session = 'mixed',
    CONCAT('mixed_',MAX(app_os) OVER (PARTITION BY country, date, custvisitid_unified RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)),
    browser) AS browser_new,
  MAX(channel) OVER (PARTITION BY country, date, custvisitid_unified RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS channel_new,
IF
  (mix_app_web_session = 'mixed',
    MAX(app_os) OVER (PARTITION BY country, date, custvisitid_unified RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
    os) AS os_new
FROM (
  SELECT
    *,
  IF
    (COUNT(DISTINCT webshop) OVER (PARTITION BY country, date, custvisitid_unified) > 1,
      'mixed',
      'single') AS mix_app_web_session
  FROM ( # define whether custvisitid_unified has hits from both app and web
    SELECT
      *,
    IF
      (user_id_anonymous_wide IS NULL
        AND mix_app_web_user = 'single',
        custvisitid,
        CONCAT(MAX(custvisitid) OVER (PARTITION BY country, date, user_id_anonymous_wide RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),cust_session_counter)) AS custvisitid_unified
    FROM ( # create custvisitid_unified, in which the linked app and web hits have been assigned a unified custom visitid. Only apply a custom visitid to user_id_anonymous_wide = 'mixed' hits (since it is a bit tricky), otherwise just use the regular visitid from GA
      SELECT
        *,
      IF
        (COUNT(DISTINCT webshop) OVER (PARTITION BY country, date, user_id_anonymous_wide) > 1,
          'mixed',
          'single') AS mix_app_web_user,
        (COUNT(new_session) OVER (PARTITION BY country, date, user_id_anonymous_wide ORDER BY timestamp_microsec)) + 1 AS cust_session_counter
      FROM ( # define session counter
        SELECT
          *,
        IF
          ((timestamp_microsec-prev_timestamp_microsec) > 2400000000,
            'new',
            NULL) AS new_session
        FROM ( # Where timestamp is greater than 40 mins (actually 30 mins, but some margin is applied to be sure)
          SELECT
            *,
          IF
            (user_id_anonymous_wide IS NOT NULL,
              LAG(timestamp_microsec,1) OVER (PARTITION BY country, date, user_id_anonymous_wide ORDER BY timestamp_microsec),
              NULL) AS prev_timestamp_microsec
          FROM ( # define previous timestamp to calculate difference in timestamp between consecutive hits
            SELECT
              *,
              MAX(user_id_anonymous) OVER (PARTITION BY country, date, custvisitid RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS user_id_anonymous_wide,
            IF
              (webshop = 'appshop',
                os,
                NULL) AS app_os # user_id_anonymous_wide: define the user_id_anonymous values for all hits within the existing sessionid (initially only in 1 of the hits)
            FROM (

                # SELECT many dimensions FROM multiple tables (which resulted in 1 table with the use of UNION ALL's

                  ) ))))))
1
I'd recommend you to optimize the query using GROUP BYs and then row-level aggregations instead of table level aggregations. Can you add the query here?Sabri Karagönen
@SabriKaragönen, thanks for the comment. I included the query in the description. Note that I hided the base query for the input table, because this part is quite lengthy and not related to the issue as described (no complex calculations applied there, just selected input fields).Timo Rietveld

1 Answers

0
votes

An update: I fixed the issue by splitting up the query in 2 queries.