I have a number of text files with data that I want to import to a date-partitioned BigQuery table from a DataflowPipelineRunner
running in batch mode. Instead of inserting to the partition of the current day at runtime I want to insert into a partition based on a date mentioned in each row. (Unfortunately I can't use the bq
command line tool to import the text files directly since I need to transform some of the values.)
I have tried to insert by outputting a timestamp from the ParDo function that is windowed into days and then applying that window and outputting table name suffixed by $
and the corresponding date.
BigQueryIO.Write.to(new SerializableFunction<BoundedWindow, String>() {
public String apply(BoundedWindow window) {
String dayString = DateTimeFormat.forPattern("yyyyMMdd")
.withZone(DateTimeZone.forID("Europe/Stockholm"))
.print(((IntervalWindow)window).start());
return dataset + "$" + dayString;
}
})
.withSchema(schema.getSchema())
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
When I try to run this I am affected by a Dataflow bug. I also found out that
Per-window tables are not yet supported in batch mode.
So how I can I write to a date-partitioned table with a specified date as partition?