0
votes

I need to execute a single query over all my projects in BigQuery. The list of projects may increase every day, so I need to do this job dynamically. All tables I need to query share the same schema, but each table is in a different project with different dataset names.

I thought to create a table to save all the project.dataset.table I need to query. Then I could execute a query where in "from" I could take the locations from the mentioned table.

But actually I don't know how to do that. Or if there is another solution I can implement...

1

1 Answers

0
votes

If you are running queries on multiple accounts, you have to somehow be explicit about specifying those accounts and their credentials in some centralized location.

Assuming you can create independent Service Account JSONs for each of those accounts, then you can simply have a local script that can do the job for you. In general, all that script really needs to do is to go over accounts and reset the environment variable GOOGLE_APPLICATION_CREDENTIALS to point to the specific account before it runs the query.

For instance, if you use Python, then something roughly on the lines of this:

import os
from google.cloud import bigquery

accounts = [
   {
      "account_name": "xyz", 
      "credentials_json": "/path/to/xyz/credentials.json", 
      "dataset_name": "dataset", 
      "table_name": "table_name"
   },
   {
      "account_name": "xyz", 
      "credentials_json": "/path/to/xyz/credentials.json", 
      "dataset_name": "dataset", 
      "table_name": "table_name"
   }
]

generic_query = '''
   select * from `{dataset_name}.{table_name}` where 1=1;
'''

def worker(account_info):
   '''
   your worker function which takes an account_info and runs the query.
   '''
   # set the credentials file env variable based on the account info
   os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = account_info.get("credentials_json")
   client = bigquery.Client()
   query = generic_query.format(dataset_name = account_info.get("dataset_name"), table_name = account_info.get("table_name"))
   query_job = client.query(query)
   rows = query_job.result()
   for row in rows:
      print(account_info.get("account_name"), row)
   return


if __name__ == "__main__":
   #--run through your accounts and submit to the worker
   while accounts:
      account_info = accounts.pop(0)
      worker(account_info)

Hope it helps.