2
votes

As per shutdown and update job in Google Dataflow with PubSubIO + message guarantees the pub/sub source for dataflow does not ack messages until they have been reliably persisted. Is there any possibility for manual control over this? We're persisting rows as a side-effect in a ParDo as there is currently no unbounded custom sink support, is there any way for us to mark that ParDo as "on bundle processing success ack these records"?

Alternatively, could we persist as a side-effect in a ParDo, if it fails throw an exception, and then after that ParDo in the pipeline have some sort of "dummy" streaming sink like BigQuery to make sure the messages are ack'd? Would throwing exceptions as part of "normal, expected behaviour" lead to new problems?

Is the answer here really "just wait for unbounded custom sink support"?

1

1 Answers

2
votes

I believe Dataflow automatically gives the behavior you want. We will not ack PubSub messages until we have finished processing them with your ParDo's and persisted the results.