0
votes

I am exporting a table of size>1GB from Bigquery into GCS but it splits the files into very small files of 2-3 MB. Is there a way to get bigger files like 40-60MB per files rather than 2-3 MB.

I do the expport via the api https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files

https://cloud.google.com/bigquery/docs/reference/v2/jobs

The source table size is 60 GB on Bigquery. I extract the data with format - NewLine_Delimited_Json and GZIP compression

destination_cloud_storage_uris=[
        'gs://bucket_name/main_folder/partition_date=xxxxxxx/part-*.gz'
    ]
2
How are you doing this extract operation from the options in here(cloud.google.com/bigquery/docs/…)? If you just run the following command: "bq extract <your-dataset>.<your-table> gs://<bucket-destination>/<file_name>*.csv" it splits your table in 2-3 MB files? I've just tried and using that command it split my 2.4 GB table into 6 files.VictorGGl
more details added in the questionAmit Kumar
As per your description it doesn't seem this is working as intended for you. If it is not the case that your table is a partitioned table then you shouldn't get this small files. So, Can you raise a private issue in this link: issuetracker.google.com/issues/new?component=187164 providing your Project Number (not Project ID) and as much information as possible describing your case? Feel free to attach the issue link here as a comment, and a final resolution as an answer to your own questionVictorGGl

2 Answers

2
votes

Are you trying to export partitioned table? If yes, each partition is exported as different table and it might cause small files. I run the export in cli with each of the following commands and received in both cases files of size 49 MB:

bq extract --compression=GZIP --destination_format=NEWLINE_DELIMITED_JSON project:dataset.table gs://bucket_name/path5-component/file-name-*.gz

bq extract --compression=GZIP project:dataset.table gs://bucket_name/path5-component/file-name-*.gz
1
votes

Please add more details to the question so we can provide specific advice: How are you exactly asking for this export?

Nevertheless, if you have many files in GCS and you want to merge them all into one, you can do:

gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite