0
votes

I currently run a SQL Query to extract data from a Public BigQuery dataset into a Table, from there I can easily use the Export function to generate a Avro file and save it into GCS. How to generate this file programmatically? I have used BQ API to read a table into a Pandas Dataframe, is the best option to read to Pandas DF and then export it to Avro? Or is the a better way to do it.

1
Why don't you export to Avro directly? cloud.google.com/bigquery/docs/…ECris
@ECris, will you be able to post your suggestion as an answer to this thread for the benefit of the community?Philipp Sh
@PhilippSh, added.ECris

1 Answers

2
votes

Why don't you export to Avro directly? This will do a table export to Avro in GCS bucket.

from google.cloud import bigquery 

client = bigquery.Client()
project = project
dataset_id = 'cdc'
table_id = 'project_tycho_reports'
bucket_name ='bucketname'

destination_uri = 'gs://{}/{}'.format(bucket_name, 'fileName')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)

job_config = bigquery.job.ExtractJobConfig()
job_config.destination_format = bigquery.DestinationFormat.AVRO

extract_job = client.extract_table(
        table_ref,
        destination_uri,
        job_config=job_config,
        location="EU",
        )  
extract_job.result()  

I saw that there is also the possibility to specify compression (not available when exporting from UI) something like job_config.compression = bigquery.Compression.SNAPPY Hope it helps.