0
votes

I have been prototyping a beam pipeline using their python SDK and have been able to use the BigQuerySink to output my final pcollection just fine using this:

     beam.io.Write(beam.io.BigQuerySink('dataset.table', 
                                self.get_schema(),
       create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,                                               
     write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE))

modifying the table to include a partition such as this: dataset.table$20170517 triggers the following error when trying to run this pipeline with the DirectRunner


"code": 400, "message": "Cannot read partition information from a table that is not partitioned:


I have studied the examples found here but found no trace of partition use https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples

How can beam sink data into partitioned bigquery tables?

1
Looking at the code, it seems like table partitions are actually supported. (github.com/apache/beam/blob/release-0.6.0/sdks/python/…). Is your table properly partitioned? I'm checking to see if DirectRunner supports the operation. - Pablo
Is your table actually partitioned? - Graham Polley
@pablo @Graham Polley: Yes the table is partitioned. When I bq show the table I get "timePartitioning": { "type": "DAY" } Now after I execute my pipeline and load data without specifying the partition, then the partition information gets removed. It's as if the BigQuerySink removes the partitioning from the table. - Jean-Christophe Rodrigue

1 Answers

1
votes

The apache_beam Python SDK does accept partition decorators for the BigQuerySink. Experimenting with the different write_disposition available reveals more information.

  • WRITE_TRUNCATE will not write to table partitions. Using the $YYYYmmdd partition in the table name will result in this error. This differs from the Google Python SDK behaviour which will actually accept the partition decorator.

    Table IDs must be alphanumeric (plus underscores) and 
    must be at most 1024 characters long. 
    
  • WRITE_EMPTY will accept the partition decorator.
  • WRITE_APPEND will accept the partition decorator.