1
votes

I am using scripts to run views and use the query results to update tables in BigQuery. I then use these tables in Data Studio for tables/visualizations.

An issue I've noticed is that sometimes my views will suddenly throw an error "Resources exceeded during query execution". My Google Apps script will run successfully, however, but the table will remain unchanged because the view couldn't be executed. As a result, the data in my Data Studio visualizations and tables will be stale. And I will have no way to know this unless I go to edit the view and see that it is throwing the error.

Is there any way to add something to my script so that the script will fail or notify me when Bigquery throws the "Resources exceeded during query execution" error? This way I won't unknowingly have stale data due to the error.

  var configuration = {
    "query": {
    "useQueryCache": false,
    "destinationTable": {
          "projectId": "abc-123",
          "datasetId": "report_tables",
          "tableId": "dashboard_1"
        },
    "writeDisposition": "WRITE_TRUNCATE",
    "createDisposition": "CREATE_IF_NEEDED",
    "allowLargeResults": true,
    "useLegacySql": false,
    "query": "SELECT * FROM `abc-123.report_tables.dashboard_1_view`"
    }
  };

  var job = {
    "configuration": configuration
  };

  var jobResult = BigQuery.Jobs.insert(job, "abc-123");
  Logger.log(jobResult);
}

I'm hoping you can help me add some clause to the above script so that when resources are exceeded in the BigQuery view, this will fail to execute or notify me in some way so I know to go correct the issue in BigQuery.

Thanks for the help!

1
I've never use BigQuery but it looks like you can add .status.errors to your command and if there are any they will appear in the response. Perhaps you can play with this a bit in the API Explorer. - Cooper

1 Answers

4
votes

Your function is succeeding because all jobs/queries on BigQuery are async. The line var jobResult = BigQuery.Jobs.insert(job, "abc-123"); only submits the job/query to the BigQuery service to begin execution. Then your function returns.

So, you need to wait and poll for the status of the job while it's executing and then deal with any errors if it fails with something like this:

  [..]
  var job = BigQuery.Jobs.insert(job, "abc-123");
  Logger.log(job.status.state);
  var jobId = job.jobReference.jobId;

  // Check on status of the Query Job.
  var sleepTimeMs = 500;
  while (job.status.state !== "DONE") {
    Utilities.sleep(sleepTimeMs);
    job = BigQuery.Jobs.get("abc-123", jobId);
    Logger.log(job.status.state);
  }
  if(job.status.errors != null && job.status.errors.length > 0) {
     Logger.log("FAILED:" + job.status.errors);
  } else {
     Logger.log("SUCCEEDED")
  }
  [..]

Sample output (failure):

[19-06-03 19:54:48:557 AEST] RUNNING
[19-06-03 19:54:49:161 AEST] RUNNING
[19-06-03 19:54:49:789 AEST] RUNNING
[19-06-03 19:54:50:368 AEST] RUNNING
[19-06-03 19:54:51:147 AEST] RUNNING
[19-06-03 19:54:51:783 AEST] RUNNING
[19-06-03 19:54:52:356 AEST] RUNNING
[19-06-03 19:54:52:957 AEST] RUNNING
[19-06-03 19:54:53:564 AEST] RUNNING
[19-06-03 19:54:54:151 AEST] RUNNING
[19-06-03 19:54:54:748 AEST] RUNNING
[19-06-03 19:54:55:338 AEST] RUNNING
[19-06-03 19:54:55:954 AEST] RUNNING
[19-06-03 19:54:56:539 AEST] RUNNING
[19-06-03 19:54:57:107 AEST] RUNNING
[19-06-03 19:54:57:724 AEST] RUNNING
[19-06-03 19:54:58:513 AEST] RUNNING
[19-06-03 19:54:59:524 AEST] RUNNING
[19-06-03 19:55:00:144 AEST] RUNNING
[19-06-03 19:55:00:993 AEST] RUNNING
[19-06-03 19:55:01:613 AEST] RUNNING
[19-06-03 19:55:02:219 AEST] RUNNING
[19-06-03 19:55:02:989 AEST] RUNNING
[19-06-03 19:55:03:557 AEST] RUNNING
[19-06-03 19:55:04:123 AEST] RUNNING
[19-06-03 19:55:04:684 AEST] RUNNING
[19-06-03 19:55:05:408 AEST] RUNNING
[19-06-03 19:55:06:018 AEST] RUNNING
[19-06-03 19:55:06:603 AEST] RUNNING
[19-06-03 19:55:07:215 AEST] RUNNING
[19-06-03 19:55:07:789 AEST] RUNNING
[19-06-03 19:55:08:424 AEST] RUNNING
[19-06-03 19:55:08:985 AEST] RUNNING
[19-06-03 19:55:09:580 AEST] RUNNING
[19-06-03 19:55:10:184 AEST] RUNNING
[19-06-03 19:55:10:802 AEST] RUNNING
[19-06-03 19:55:11:391 AEST] RUNNING
[19-06-03 19:55:11:984 AEST] RUNNING
[19-06-03 19:55:12:564 AEST] RUNNING
[19-06-03 19:55:13:154 AEST] RUNNING
[19-06-03 19:55:13:752 AEST] RUNNING
[19-06-03 19:55:14:372 AEST] RUNNING
[19-06-03 19:55:14:974 AEST] RUNNING
[19-06-03 19:55:15:661 AEST] RUNNING
[19-06-03 19:55:16:227 AEST] RUNNING
[19-06-03 19:55:16:815 AEST] RUNNING
[19-06-03 19:55:17:416 AEST] RUNNING
[19-06-03 19:55:18:002 AEST] DONE
[19-06-03 19:55:18:003 AEST] FAILED:{"reason":"resourcesExceeded","message":"Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 121% of limit.\nTop memory consumer(s):\n  ORDER BY operations: 99%\n  other/unattributed: 1%\n"}

Sample Output (success):

[19-06-03 19:59:48:206 AEST] RUNNING
[19-06-03 19:59:48:820 AEST] RUNNING
[19-06-03 19:59:49:433 AEST] RUNNING
[19-06-03 19:59:50:015 AEST] DONE
[19-06-03 19:59:50:016 AEST] SUCCEEDED

Something to consider is that using Google Apps Script ties all this to your personal Google account i.e. not a service account. That might be ok for you, but in the enterprise it's a bit of a no-no.

Finally, I'd probably move your scripts into Cloud Build. I'd argue that it's much more powerful, maintainable, easier to use and flexible.