BigQuery splits the data exported into several files if it is larger than 1GB. But you can merge these files with the gsutil tool, check this official documentation to know how to perform object composition with gsutil.
As BigQuery export the files with the same prefix, you can use a wildcard *
to merge them into one composite object:
gsutil compose gs://example-bucket/component-obj-* gs://example-bucket/composite-object
Note that there is a limit (currently 32) to the number of components that can be composed in a single operation.
The downside of this option is that the header row of each .csv
file will be added in the composite object. But you can avoid this by modifiyng the jobConfig
to set the print_header
parameter to False
.
Here is a Python sample code, but you can use any other BigQuery Client library:
from google.cloud import bigquery
client = bigquery.Client()
bucket_name = 'yourBucket'
project = 'bigquery-public-data'
dataset_id = 'libraries_io'
table_id = 'dependencies'
destination_uri = 'gs://{}/{}'.format(bucket_name, 'file-*.csv')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.job.ExtractJobConfig(print_header=False)
extract_job = client.extract_table(
table_ref,
destination_uri,
# Location must match that of the source table.
location='US',
job_config=job_config) # API request
extract_job.result() # Waits for job to complete.
print('Exported {}:{}.{} to {}'.format(
project, dataset_id, table_id, destination_uri))
Finally, remember to compose an empty .csv
with just the headers row.