I currently run a SQL Query to extract data from a Public BigQuery dataset into a Table, from there I can easily use the Export function to generate a Avro file and save it into GCS. How to generate this file programmatically? I have used BQ API to read a table into a Pandas Dataframe, is the best option to read to Pandas DF and then export it to Avro? Or is the a better way to do it.
0
votes
1 Answers
2
votes
Why don't you export to Avro directly? This will do a table export to Avro in GCS bucket.
from google.cloud import bigquery
client = bigquery.Client()
project = project
dataset_id = 'cdc'
table_id = 'project_tycho_reports'
bucket_name ='bucketname'
destination_uri = 'gs://{}/{}'.format(bucket_name, 'fileName')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.job.ExtractJobConfig()
job_config.destination_format = bigquery.DestinationFormat.AVRO
extract_job = client.extract_table(
table_ref,
destination_uri,
job_config=job_config,
location="EU",
)
extract_job.result()
I saw that there is also the possibility to specify compression (not available when exporting from UI) something like job_config.compression = bigquery.Compression.SNAPPY
Hope it helps.