How to build modular queries in BigQuery?

Question

I have some useful queries, I'd like to build a few more complex ones that needs them as sub queries. Can I call them by name ?

I'v seen the 'save view' option and was able to build new queries that used saved views.

Does this method refreshes the saved view each time a top query uses it, by re-executing the relevant queries ? or is it just a named query result, that I have to rerun each time to refresh ?

other suggestions to build queries in modular fashion ? For example when I change the days range I select from I want all subqueries to use the range. In programming it's either using promoters or globals, how to do this in BigQuery ?

Alexandre Moraes Alexandre Moraes · Accepted Answer · 2020-07-31T09:00:24

Whilst it is very diffcult to address your questions due to its broadness. I will answer them with general guidelines and examples for each doubt.

Regarding your first question, about subqueries and calling queries by an alias. I have 2 considerations about these:

1) You can use subqueries with WITH. So, you perform your transformations in the data, save it in a temporary table and reference it in the following (sub)query. Moreover, every time you run the code, all the queries will be executed. Below is an example,

WITH data as (
SELECT "Alice" AS name, 39 AS age, "San Francisco" AS city  UNION ALL
SELECT "Marry" AS name, 35 AS age, "San Francisco" AS city UNION ALL
SELECT "Phill" AS name, 18 AS age, "Boston" AS city UNION ALL
SELECT "Robert" AS name, 10 AS age, "Tampa" AS city 
),
greater_30 AS (
SELECT * FROM data 
WHERE age > 30
),
SF_30 AS (
SELECT * FROM greater_30
WHERE city = "San Francisco"
)
SELECT * FROM SF_30

and the output,

Row name    age city
1   Alice   39  San Francisco
2   Marry   35  San Francisco

2) Create a Stored Procedure: procedures are blocks of statements which can be called from other queries and also executed recursively ( call one procedure inside other). In order to create and store a procedure you have to specify the project and dataset where it will be saved. As well as, its name. Below is an example (using a BigQuery public dataset),

#creating the procedure
CREATE or replace PROCEDURE project_id.ataset.chicago_taxi(IN trip_sec INT64, IN price INT64)
BEGIN
CREATE TEMP TABLE taxi_rides AS 
SELECT * FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips` 
WHERE trip_seconds > trip_sec and fare >price
LIMIT 10000
;
END;

Now, you can call the procedure using CALL. As follows:

DECLARE trip_sec INT64 DEFAULT 30;
DECLARE price INT64 DEFAULT 30;
CALL `project_id.ataset.chicago_taxi`(trip_sec, price);

SELECT max(fare) AS max_fare,payment_type FROM taxi_rides
GROUP BY  payment_type

And the output,

Row max_fare    payment_type
1   463.45  Cash
2   200.65  Credit Card

Notice that the procedure is saved within the dataset. Then we use CALL to call it and use its output (a temporary table) in the next select statement. I must point that every time the procedure is invoked, it executes the query.

Regarding your question about saved views: the view is updated every time you run it. Please refer to the documentation.

Finally, about the last question using parameters and globals in queries: you can use scripting in BigQuery in order to DECLARE and SET a variable. So, you can take advantages when changing filter parameters for example. Below there is an usage example using a public a public dataset,

DECLARE time_s timestamp;
SET time_s= timestamp(DATETIME "2016-01-01 15:30:00");
SELECT * FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips` 
WHERE trip_start_timestamp > time_s
LIMIT 10000

Pay attention that every time the filter needs to be changed, it is possible to do it from the SET statement.

Note: If you have any specific question, please open another thread or you can ask me in the comment section.

How to build modular queries in BigQuery?

1 Answers