1
votes

Is there way to set the expiration time on a BigQuery table when using Dataflow's BigQueryIO.Write sink?

For example, I'd like something like this (see last line):

PCollection<TableRow> mainResults...
mainResults.apply(BigQueryIO.Write
                .named("my-bq-table")
                .to("PROJECT:dataset.table")
                .withSchema(getBigQueryTableSchema())
                .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_TRUNCATE)
                .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED))
                .withExpiration(1452030098l) //**this table should expire on 31st Jan

I can't see anything in the Dataflow API that would facilitate this. Of course, I could just use the BigQuery API, but it would be much better to be able to this in the via Dataflow when specifying the sink.

2

2 Answers

2
votes

This isn't currently supported in the Dataflow API. We can look at adding it soon, as it should be a straightforward addition.

0
votes

You can set a defaultTableExpirationMs on a dataset, and then any table created within that dataset will have an Expiration Time of "now + dataset.defaultTableExpirationMs".

See https://cloud.google.com/bigquery/docs/reference/v2/datasets#defaultTableExpirationMs