0
votes

With the 2.2 Dataflow SDK is it possible to achieve exactly-once delivery using the messageId field that Pubsub automatically applies? Or is this perhaps even a default behavior?

It seems like this normally might be possible with PubsubIO.Read.withIdAttribute but I don't have control over the published message, so I was hoping to use the messageId field.

In the old 1.x docs it states:

In addition, you can achieve exactly once processing of Pub/Sub message streams, as PubsubIO de-duplicates messages based on custom message identifiers or identifiers assigned by Pub/Sub.

Where the last part there kind of hints that the default behavior of the read might inherently use the messageId for exactly-once delivery purposes.

1

1 Answers

2
votes

Yes, both 1.x and 2.2 are same in this regard. Both provide exactly-once processing semantics. Pubsub message id is used in both.

Note that it exactly-once "processing" and exactly-once "delivery" are not always the same. Delivery often implies sink to an external system, which is outside the scope of a Dataflow pipeline. A sink might have duplicate writes when there are retries in the pipeline.