Does apache_beam Python SDK 0.6.0 BigQuerySink support table partitions?

Question

I have been prototyping a beam pipeline using their python SDK and have been able to use the BigQuerySink to output my final pcollection just fine using this:

     beam.io.Write(beam.io.BigQuerySink('dataset.table', 
                                self.get_schema(),
       create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,                                               
     write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE))

modifying the table to include a partition such as this: dataset.table$20170517 triggers the following error when trying to run this pipeline with the DirectRunner

"code": 400, "message": "Cannot read partition information from a table that is not partitioned:

I have studied the examples found here but found no trace of partition use https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples

How can beam sink data into partitioned bigquery tables?

Looking at the code, it seems like table partitions are actually supported. (github.com/apache/beam/blob/release-0.6.0/sdks/python/…). Is your table properly partitioned? I'm checking to see if DirectRunner supports the operation. — Pablo
@pablo @Graham Polley: Yes the table is partitioned. When I bq show the table I get "timePartitioning": { "type": "DAY" } Now after I execute my pipeline and load data without specifying the partition, then the partition information gets removed. It's as if the BigQuerySink removes the partitioning from the table. — Jean-Christophe Rodrigue

Jean-Christophe Rodrigue Jean-Christophe Rodrigue · Accepted Answer · 2017-05-18T22:35:23

The apache_beam Python SDK does accept partition decorators for the BigQuerySink. Experimenting with the different write_disposition available reveals more information.

WRITE_TRUNCATE will not write to table partitions. Using the $YYYYmmdd partition in the table name will result in this error. This differs from the Google Python SDK behaviour which will actually accept the partition decorator.
```
Table IDs must be alphanumeric (plus underscores) and 
must be at most 1024 characters long. 
```
WRITE_EMPTY will accept the partition decorator.
WRITE_APPEND will accept the partition decorator.

Does apache_beam Python SDK 0.6.0 BigQuerySink support table partitions?

1 Answers