I have been prototyping a beam pipeline using their python SDK and have been able to use the BigQuerySink to output my final pcollection just fine using this:
beam.io.Write(beam.io.BigQuerySink('dataset.table',
self.get_schema(),
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE))
modifying the table to include a partition such as this: dataset.table$20170517 triggers the following error when trying to run this pipeline with the DirectRunner
"code": 400, "message": "Cannot read partition information from a table that is not partitioned:
I have studied the examples found here but found no trace of partition use https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples
How can beam sink data into partitioned bigquery tables?
bq showthe table I get "timePartitioning": { "type": "DAY" } Now after I execute my pipeline and load data without specifying the partition, then the partition information gets removed. It's as if the BigQuerySink removes the partitioning from the table. - Jean-Christophe Rodrigue