5
votes

Is there a way to extract the complete BigQuery partitioned table with one command so that data of each partition is extracted into a separate folder of the format part_col=date_yyyy-mm-dd

Since Bigquery partitioned table can read files from the hive type partitioned directories, is there a way to extract the data in a similar way. I can extract each partition separately, however that is very cumbersome when i an extracting a lot of partitions

1
seems like a feature request for the issue trackerFelipe Hoffa

1 Answers

8
votes

You could do this programmatically. For instance, you can export partitioned data by using the partition decorator such as table$20190801. And then on the bq extract command you can use URI Patterns (look the example of the workers pattern) for the GCS objects.

Since all objects will be within the same bucket, the folders are just an hierarchical illusion, so you can specify URI patterns on the folders as well, but not on the bucket.

So you would do a script where you loop over the DATE value, with something like:

bq extract 
--destination_format [CSV, NEWLINE_DELIMITED_JSON, AVRO] 
--compression [GZIP, AVRO supports DEFLATE and SNAPPY] 
--field_delimiter [DELIMITER] 
--print_header [true, false] 
[PROJECT_ID]:[DATASET].[TABLE]$[DATE]
gs://[BUCKET]/part_col=[DATE]/[FILENAME]-*.[csv, json, avro]

You can't do it automatically with just a bq command. For this it would be better to raise a feature request as suggested by Felipe.