BigQuery cost breakdown by user in a specific date range

Question

I created logging export of BigQuery activities using Stackdriver Logging in the cloud console.

I found the following query, in google documentation, to extract BigQuery cost breakdown by user:

#standardSQL
WITH data as
(
  SELECT
    protopayload_auditlog.authenticationInfo.principalEmail as principalEmail,
    protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent as jobCompletedEvent
  FROM
    `AuditLogsDataSet.cloudaudit_googleapis_com_data_access_YYYYMMDD`
)
SELECT
  principalEmail,
  FORMAT('%9.2f',5.0 * (SUM(jobCompletedEvent.job.jobStatistics.totalBilledBytes)/1000000000000)) as Estimated_USD_Cost
FROM
  data
WHERE
  jobCompletedEvent.eventName = 'query_job_completed'
GROUP BY principalEmail
ORDER BY Estimated_USD_Cost DESC

I can't use TABLE_DATE_RANGE in a standard SQL query. There is a way to perform the query above in a specific date range?

dsesto dsesto · Accepted Answer · 2018-07-09T08:20:41

I see that you were already able to figure it out by using Wildcard Tables in Standard SQL thanks to the comment by @ElliotBrossard; in any case, I would like to add some more information for any other user that is interested about the topic.

In order to query the data from multiple tables at once in BigQuery, the wildcard functionality is used. The approach to use Wildcards in Legacy SQL and Standard SQL is different, so let's cover both options separately (although, take into account that Standard SQL is the preferred SQL dialect for BigQuery):

In Legacy SQL, you should use table wildcard functions, such as the one that you asked for, TABLE_DATE_RANGE(). These functions are specified in the FROM clause, and there are three different functions available, which you can check in the documentation linked.

As an example, let's say I have the tables wildcard1, wildcard2 and wildcard3 in the dataset project:dataset. Running the following query (using a wildcard function from the ones available), would return the content of all tables whose ID contains the word wildcard:

#legacySQL
SELECT * FROM (TABLE_QUERY([project:dataset], 'table_id CONTAINS "wildcard"'))

In Standard SQL, instead, you should use Wildcard Tables. The functionality is equivalent to the one of wildcard functions in Legacy SQL, the idea behind them is identical (you will only be billed for the queried data [tables matching the wildcard]), and regarding limitations, cached results are not supported in either of the possibilities.

The same sample query as before, but now using Wildcard Tables in Standard SQL would be:

#standardSQL
SELECT * FROM `project.dataset.wildcard*`

BigQuery cost breakdown by user in a specific date range

1 Answers