My streaming dataflow pipeline, which pulls data from PubSub, doesn't write away to BigQuery and logs no errors. The elements go into the node "Write to BigQuery/StreamingInserts/StreamingWriteTables/Reshuffle/GroupByKey":
which is created implicitly like this:
PCollection<TableRow> rows = ...;
rows.apply("Write to BigQuery",
BigQueryIO.writeTableRows().to(poptions.getOutputTableName())
.withSchema(...)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withMethod(BigQueryIO.Write.Method.STREAMING_INSERTS)
.withFailedInsertRetryPolicy(InsertRetryPolicy.alwaysRetry())
.withExtendedErrorInfo());
But elements never leave it, or at least not within the system lag which is now 45min. This is supposed to be a streaming job - how can I make it flush and write the data? This is beam version 2.13.0. Thank you.
UPDATE - StackDriver log (no errors) for the step to write data to BigQuery:
I can also add that this works if I use the DirectRunner in the cloud (but only for small amounts of data) and either runner if I insert row by row using the java interface to BigQuery (but that is at least two orders of magnitude too slow to start with).