1
votes

Would someone please let me if there is a way to save the BigQuery Result to JSON or Avro format.

I am using following code to run the query on BigQuery Table.

client = bigquery.Client.from_service_account_json('/Users/gaurang.shah/Downloads/fb3735b731b9.json')

job_config = bigquery.QueryJobConfig()
job_config.priority = bigquery.QueryPriority.BATCH
sql = """
   select * 
    FROM `bigquery-public-data.samples.shakespeare`
    limit 1;
"""
location = 'US'
query_job = client.query(sql, location=location, job_config=job_config)
query_job = client.get_job(query_job.job_id, location=location)  
print(query_job.result())

I am trying to export the BigQuery table without using GCS in between. And this is one way I think I could achieve that.

The other way I think is using BQ command line tool. However not sure if it has any limit on how many queries I can fire and how much data I can retrieve.

1

1 Answers

3
votes

You need to first run your query, write the results to a table, and then hook into the BigQuery export/extract API, where the results/table can be exported to GCS in the format you want. For example, here's CSV:

# from google.cloud import bigquery
# client = bigquery.Client()
# bucket_name = 'my-bucket'
project = 'bigquery-public-data'
dataset_id = 'samples'
table_id = 'shakespeare'

destination_uri = 'gs://{}/{}'.format(bucket_name, 'shakespeare.csv')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)

extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location='US')  # API request
extract_job.result()  # Waits for job to complete.

print('Exported {}:{}.{} to {}'.format(
    project, dataset_id, table_id, destination_uri))

See more here.