3
votes

I am trying to run a SQL from the Linux Commandline for a BQ Table destination. This SQL script will be used for multiple dates, clients, and BQ Table destinations, so this would require using parameters in my BQ API-commandline calls (the flag --parameter). Now, I have followed this link to learn about parameterized queries: https://cloud.google.com/bigquery/docs/parameterized-queries , but it's limited in helping me with declaring a table name.

My SQL script, called Advertiser_Date_Check.sql, is the following:

#standardSQL
SELECT *
FROM (SELECT *
      FROM @variable_table
      WHERE CAST(_PARTITIONTIME AS DATE) = @variable_date) as final
WHERE final.Advertiser IN UNNEST(@variable_clients)

Where the parameter variables represent the following:

  • variable_table: The BQ Table destination that I want to call
  • variable_date: The Date that I want to pull from the BQ table
  • variable_clients: An Array list of specific clients that I want to pull from the data (which is from the date I referenced)

Now, my Commandline (LINUX) for the BQ data is the following

TABLE_NAME=table_name_example
BQ_TABLE=$(echo '`project_id.dataset_id.'$TABLE_NAME'`')
TODAY=$(date +%F)

/bin/bq query --use_legacy_sql=false    \
       --parameter='variable_table::'$BQ_TABLE''  \
       --parameter=variable_date::"$TODAY"    \
       --parameter='variable_clients:ARRAY<STRING>:["Client_1","Client_2","Client_3"]'  \
       "`cat /path/to/script/Advertiser_Date_Check.sql`" 

The parameters of @variable_date and @variable_clients have worked just fine in the past when it was just them. However, since I desire to run this exact SQL command on various tables in a loop, I created a parameter called variable_table. Parameterized Queries have to be in Standard SQL format, so the table name convention needs to be in such format:

`project_id.dataset_id.table_name`

Whenever I try to run this on the Commandline, I usually get the following error:

Error in query string: Error processing job ... : Syntax error: Unexpected "@" at [4:12]

Which is referencing the parameter @variable_table, so it's having a hard time processing that this is referencing a table name. In past attempts, there even has been the error:

project_id.dataset_id.table_name: command not found

But this was mostly due to poor reference of table destination name. The first error is the most common occurrence.

Overall, my questions regarding this matter are:

  1. How do I reference a BQ Table as a parameter in the Commandline for Parameterized Queries at the FROM Clause (such as what I try to do with @variable_table)? Is it even possible?
  2. Do you know of other methods to run a query on multiple BQ tables from the commandline besides by the way I am currently doing it?

Hope this all makes sense and thank you for your assistance!

2
From the documentation that you linked, "Parameters cannot be used as substitutes for identifiers, column names, table names, or other parts of the query."Elliott Brossard
Oh geez. I must have just missed that part. It's been quite a long day. Thanks for the reply.C.Theisen
I think what might work for you, though, is performing the injection of the table name as a regular shell variable (instead of a query parameter). You'd want to make sure that you trust the contents of it, though, or that you are building the string yourself in order to avoid SQL injection.Elliott Brossard
@ElliottBrossard - would you mind promoting your comment to an answer for posterity? I made the same mistake a few months ago :-)Graham Polley
Done :) I think we did update this topic recently, so you might have seen it before there was the explicit note about parameters only being able to replace expressions.Elliott Brossard

2 Answers

6
votes

From the documentation that you linked:

Parameters cannot be used as substitutes for identifiers, column names, table names, or other parts of the query.

I think what might work for you in this case, though, is performing the injection of the table name as a regular shell variable (instead of a query parameter). You'd want to make sure that you trust the contents of it, or that you are building the string yourself in order to avoid SQL injection. One approach is to have hardcoded constants for the table names and then choose which one to insert into the query text based on the user input.

0
votes

I thought I would just post my example here which only covers your question about creating a "dynamic table name", but you can also use my approach for your other variables. My approach was to do this operation directly in python just before doing the BigQuery API call, by leveraging python's internal time function (assuming you want your variables to be time-based).

Create BigQuery-table via Python BQ API:

from google.colab import auth
from datetime import datetime
from google.cloud import bigquery

auth.authenticate_user()
now = datetime.now()
current_time = now.strftime("%Y%m%d%H%M")

project_id = '<project_id>'
client = bigquery.Client(project=project_id)

table_id = "<project_id>.<dataset_id>.table_"
table_id = table_id + current_time
job_config = bigquery.QueryJobConfig(destination=table_id)

sql = """
SELECT
    dataset_id,
    project_id,
    table_id,
    CASE
      WHEN type = 1 THEN 'table'
      WHEN type = 2 THEN 'view'
      WHEN type = 3 THEN 'external'
      ELSE '?'
    END AS type,
    DATE(TIMESTAMP_MILLIS(creation_time)) AS creation_date,
    TIMESTAMP_MILLIS(creation_time) AS creation_time,
    row_count,
    size_bytes,
    round(safe_divide(size_bytes, (1000*1000)),1) as size_mb,
    round(safe_divide(size_bytes, (1000*1000*1000)),3) as size_gb
FROM (select * from `<project_id>:<dataset_id>.__TABLES__`)
ORDER BY dataset_id, table_id asc;
"""

query_job = client.query(sql, job_config=job_config)
query_job.result()
print("Query results loaded to the table {}".format(table_id))

# Output: 
# Query results loaded to the table <project_id>.<dataset_id>.table_202101141450

Feel free to copy and test it within a google colab notebook. Just fill in your own:

  • <project_id>
  • <dataset_id>