13
votes

I want to export query results from BigQuery to local file/Google storage.

I tried 'bq extract' command but it doesn't allow query as input.

Usage: bq extract <source_table> <destination_uris>

I don't want to extract the whole table as the table contains many columns which are not required and I need to aggregate the data.

As of now, the only workaround I could find is to create a table using the 'bq query' command and use the 'bq extract' to extract the data.

I'm looking for any better way to achieve this by doing something like below.

bq extract 'select dept_id,sum(sal) from temp.employee 
                                 group by dept_id' 'gs://XXXX/employee.csv'
8

8 Answers

10
votes

Direct export from BigQuery Standard SQL was added recently: Exporting data to csv format

EXPORT DATA OPTIONS(
  uri='gs://mybucket/myfolder2/*.csv',
  format='CSV',
  overwrite=true,
  header=true,
  field_delimiter=';') AS
SELECT 1 as field1, 2 as field2 
4
votes

BigQuery does not provide ability to directly export/download query result to GCS or Local File. First you need to get result of query either in explicitly set destination table or if not set you can use temp (anonymous) table that holds query result - you can get it (table) from respective job attribute configuration.query.destinationTable (after job is completed)
Then you can use that table as a source for Export job

4
votes

As Mikhail Berlyant said,

BigQuery does not provide ability to directly export/download query result to GCS or Local File.

You can still export it using the Web UI in just three steps

  1. Configure query to save the results in a BigQuery table and run it.
  2. Export the table to a bucket in GCS.
  3. Download from the bucket.

Step 1

When in BigQuery screen, before running the query go to More > Query Settings

Configure Query

This opens the following

Query Settings

Here you want to have

  • Destination: Set a destination table for query results
  • Project name: select the project.
  • Dataset name: select a dataset. If you don't have one, create it and come back.
  • Table name: give whatever name you want (must contain only letters, numbers, or underscores).
  • Result size: Allow large results (no size limit).

Then Save it and the Query is configured to be saved in a specific table. Now you can run the Query.

Step 2

To export it to GCP you have to go to the table and click EXPORT > Export to GCS.

BigQuery export table

This opens the following screen

Export to GCS

In Select GCS location you define the bucket, the folder and the file.

For instances, you have a bucket named daria_bucket (Use only lowercase letters, numbers, hyphens (-), and underscores (_). Dots (.) may be used to form a valid domain name.) and want to save the file(s) in the root of the bucket with the name test, then you write (in Select GCS location)

daria_bucket/test.csv

If the file is too big (more than 1 GB), you'll get an error. To fix it, you'll have to save it in more files using wildcard. So, you'll need to add *, just like that

daria_bucket/test*.csv

Wildcard export to GCS

This is going to store, inside of the bucket daria_bucket, all the data extracted from the table in more than one file named test000000000000, test000000000001, test000000000002, ... testX.

Step 3

Then go to Storage and you'll see the bucket.

GCS bucket

Go inside of it and you'll find the one (or more) file(s). You can then download from there.

4
votes

You can always use bash scripts for things that are not supported by gcloud cli. For example you can dump query results into a file:

bq query --format=csv --max_rows=999999 --use_legacy_sql=false\
  "select dept_id,sum(sal) from temp.employee group by dept_id" > employee.csv

And then you can just upload it to cloud storage:

gsutil mv employee.csv 'gs://XXXX/employee.csv'
3
votes

These days (till they change the console again) you can achieve this by

1_ run your query   
2_ when it is completed, then you can use "Save the result" dropdown
to save it in a *local/cloud* file.

Be advised this option is not supported on command-line or API call at the moment :-)

2
votes

Filtering for export is not supported. You can store result of query as another temp table and export this table so it will be two step process.

2
votes

As @MikhailBerlyan points out, you need an intermediate table. You can do it by using the BigQuery UI:

  • Run the query for filtering ('select dept_id,sum(sal) from temp.employee group by dept_id' in your case).
  • Once the query is done click on Save as->Save as table. This step creates the intermediate table with the filtering done.
  • Open that table, and on the right most side click on Export->Export to GCS.

Here is the documentation for exporting.

0
votes

My company has not subscribed to google drive, so we use this workaround -

  1. Run the query in BQ Query editor.
  2. Click on Save Query Results and select Bigquery Table from Choose where to save the results data from the query dropdown.
  3. Select the project name, dataset name, and provide a table name e.g data_dump_13_jan and click on save.
  4. Depending on the data set, it might take a few seconds to a few hours.
  5. Once the job is done, go to the cloud shell terminal and run the following command.

bq --location=<your GCP Zone> --destination_format CSV <dataset-id>:<table-name> gs://<gcp-bucket>/<dump-file-name>.csv