Google bigquery export table to multiple files in Google Cloud storage and sometimes one single file

Question

I am using Bigquery python libraries to export data from Bigquery tables into GCS in csv format.

I have given a wildcard pattern assuming some tables can be more than 1 GB

Sometimes even though table is few MB it creates multiple files and sometimes just it creates just 1 file.

Is there a logic behind this?

My export workflow is the following:

project = bq_project dataset_id = bq_dataset_id table_id = bq_table_id     
bucket_name =bq_bucket_name workflow_name=workflow_nm 
csv_file_nm=workflow_nm+"/"+csv_file_prefix_in_gcs+'*'client = 
bigquery.Client() destination_uri = "gs://{}/{}".format(bucket_name, 
csv_file_nm) dataset_ref = client.dataset(dataset_id, project=project) 
table_ref = dataset_ref.table(table_id) destination_table = 
client.get_table(dataset_ref.table(table_id)) configuration = 
bigquery.job.ExtractJobConfig() configuration.destination_format='CSV' – 
csv_file_nm=workflow_nm+"/"+csv_file_prefix_in_gcs

wich wildcard are you using? gs://my-bucket/file-name-*.json or gs://my-bucket/file-name-<worker number>-*.json ? — Chris32
project = bq_project dataset_id = bq_dataset_id table_id = bq_table_id bucket_name =bq_bucket_name workflow_name=workflow_nm csv_file_nm=workflow_nm+"/"+csv_file_prefix_in_gcs+'*'client = bigquery.Client() destination_uri = "gs://{}/{}".format(bucket_name, csv_file_nm) dataset_ref = client.dataset(dataset_id, project=project) table_ref = dataset_ref.table(table_id) destination_table = client.get_table(dataset_ref.table(table_id)) configuration = bigquery.job.ExtractJobConfig() configuration.destination_format='CSV' — Sreekanth

Chris32 Chris32 · Accepted Answer · 2019-10-18T10:17:08

I think this is an intended behaviour of the export. The Bigquery Export documentation specifies the following:

When you export data to multiple files, the size of the files will vary.

This corresponds to the behavior you are seeing in your exports.

Google bigquery export table to multiple files in Google Cloud storage and sometimes one single file

1 Answers