0
votes

According to the BigQuery docs, I should be able to export a single partition of a partitioned table:

Exporting all data from a partitioned table is the same process as exporting data from a non-partitioned table. For more information, see Exporting table data. To export data from an individual partition, append the partition decorator, $date, to the table name. For example: mytable$20160201.

However running the following extract command extracts the entire table, not just one partition. It is driving me nuts! What am I doing wrong?

bq --location=europe-west2 extract \
--destination_format NEWLINE_DELIMITED_JSON \
--compression GZIP \
bq-project-name:dataset.table_name$20200405 \
"gs://bucket-name/test_ga_sessions*.json.gz"

Adding partitioning information of source table here

Source Table Partitioning Info

I have also confirmed that the partition I am attempting to extract exists

#legacySQL
SELECT
  partition_id,
  creation_time,
  creation_timestamp,
  last_modified_time,
  last_modified_timestamp
FROM
  [dataset.tablename$__PARTITIONS_SUMMARY__]
  where partition_id = '20200405'

enter image description here

1
Have you tried quoting the table name? The $ operator might be messing with the table name in the terminalTlaquetzal
Thanks @Tlaquetzal. I tried that too and it didn't work eitherGraham
On the top of my mind, another thing that might be messing with the perception of the partitions is: the table might be partitioned by ingestion time rather than a date column. I noticed that in the bucket-name you're using has the name "ga_sessions" for the end file. If you're using information from the GA export the tables you are looking to export might not even be partitioned. Can you update the question with the partition details of the table you're looking to export?Tlaquetzal
@Tlaquetzal the table I am exporting contains transformed GA data hence the name. It is not an original GA table. I have updated my question to include partitioning information for the table I am trying to export.Graham
Thanks for the update. It all looks fine, so it might be an issue with the service. Last things I will check, is, 1) what is the behavior when you try to export a partition that doesn't exist, for example the day 20100101. Does this export all the table as well ?; and 2) Consider that when using wildcard in the destination URI the count in the file name will start at 00000 not the exported date. Let me know if this information doesn't provide new insightsTlaquetzal

1 Answers

1
votes

Because I was running the bq extract command in a bash shell, the partition decorator $20200405 was being interpreted as a variable and an empty one at that. Therefore the full partition identifier of bq-project-name:dataset.table_name$20200405 was being interpreted as bq-project-name:dataset.table_name by the time the request reached BigQuery.

In order to get this command to run correctly, all I had to do was escape the $ character of the partition decorator with a backslash as follows:

bq --location=europe-west2 extract \
--destination_format NEWLINE_DELIMITED_JSON \
--compression GZIP \
bq-project-name:dataset.table_name\$20200405 \
"gs://bucket-name/test_ga_sessions*.json.gz"