0
votes

We need to generate a CSV file from data which resides in multiple tables in BigQuery.

Currently we are trying to build a java service which will fetch data in TableResult with help of google-cloud-bigquery package and collecting data in TableResult object and looping through it to create a CSV. How fast and scalable this method is.

Or do we have any other option provided by GCP to accomplish this task. Please suggest.

1

1 Answers

0
votes

The most efficient way to do so, is using build-in BigQuery export to GCS mechanism, because BigQuery uses its parallelism to write the output using multiple threads. You can export up to 1 GB of table data to a single file. If you are exporting more than 1 GB of data, use a wildcard to export the data into multiple files:

gs://<BUCKET_NAME>/csvfilename*.csv

Then, when all your .csv files are in Storage bucket, you can use gsutil compose command, which creates a new object whose content is the concatenation of a given sequence of source objects under the same bucket:

gsutil compose gs://<BUCKET_NAME>/a.csv gs://<BUCKET_NAME>/b.csv gs://<BUCKET_NAME>/composite.csv

where composite.csv is a concatenation of a.csv and b.csv. Please, refer to the documentation.

I hope it helps.