“Error: Resources exceeded during query execution” resulting from SQL call using FLATTEN

Question

I am currently attempting to run the following SQL script in BigQuery (with the goal of saving the table out) but am getting a SQL processing error relating to the capacity after I start the query without a row return size limit. The error is the common one: "Error: Resources exceeded during query execution."

SELECT date, 
       Concat(fullvisitorid, String(visitid)) AS unique_visit_id, 
       visitid, 
       visitnumber, 
       fullvisitorid, 
       totals.pageviews, 
       totals.bounces, 
       hits.page.pagepath, 
       hits.page.pagetitle, 
       device.devicecategory, 
       device.browser, 
       device.browserversion, 
       hits.customvariables.index, 
       hits.customvariables.customvarname, 
       hits.customvariables.customvarvalue, 
       hits.time 
FROM   (Flatten([XXXXXXXX.ga_sessions_20140711], hits.time)) 
WHERE  hits.customvariables.index = 4 
ORDER  BY unique_visit_id DESC, 
          hits.time ASC

The job ID that was provided for the job is: ua-web-analytics:job_60fxNnmo9gZ23kaji50v3qH9dXs. I have read the other posts on the topic of these errors such as this post which focuses on the resource errors observed completing a join. I suspect that the issue right now is with the use of FLATTEN, and am working through some different approaches. That said, I am concerned because, in future, this query may be run on 30 or 60 days together (versus just the single day that I am prototyping on right now) which will dramatically increase the data size to over 500GB to 1TB. The goal of the above query was to generate a table which I could save out and then operate on. Unfortunately, doing this in an Ad Hoc manner seems somewhat problematic. Has anyone else encountered resource constraints when using a similar SQL query? For context, the table that is being queried over is about 17.2 GB in size, with just over a million rows.

Got it. Thanks! Key point I noticed is that a table needs to be specified in advance in order to use the "Allow Large Results" option flag! — Nathaniel Payne

Jordan Tigani Jordan Tigani · Accepted Answer · 2014-07-14T15:15:43

As @Pentium10 mentioned, setting allow large results will allow you to return the larger results from the flattened query. Usually the signal that you should use "allow large results" is that you see a "result too large" error.

However, there is another part of your query that is unparallelizable: the ORDER BY operation. Is this required? Usually, we've found that most of the time when ORDER BY is used on large tables, what people really want is an ORDER BY ... LIMIT (which can be done efficiently and in parallel). Or they are just adding the ORDER BY because it makes it easier to eyeball the results. If you can drop the ORDER BY it will likely make your query faster and scale better as the data size increases.

“Error: Resources exceeded during query execution” resulting from SQL call using FLATTEN

2 Answers