I am using Bigquery python libraries to export data from Bigquery tables into GCS in csv format.
I have given a wildcard pattern assuming some tables can be more than 1 GB
Sometimes even though table is few MB it creates multiple files and sometimes just it creates just 1 file.
Is there a logic behind this?
My export workflow is the following:
project = bq_project dataset_id = bq_dataset_id table_id = bq_table_id
bucket_name =bq_bucket_name workflow_name=workflow_nm
csv_file_nm=workflow_nm+"/"+csv_file_prefix_in_gcs+'*'client =
bigquery.Client() destination_uri = "gs://{}/{}".format(bucket_name,
csv_file_nm) dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id) destination_table =
client.get_table(dataset_ref.table(table_id)) configuration =
bigquery.job.ExtractJobConfig() configuration.destination_format='CSV' –
csv_file_nm=workflow_nm+"/"+csv_file_prefix_in_gcs
gs://my-bucket/file-name-*.jsonorgs://my-bucket/file-name-<worker number>-*.json? - Chris32