1
votes

I have a short project where I push a number of messages (~1000) and then I try to process them on a single thread, but I still receive duplicates.

Is this a desired behavior of PubSub?

this is the code to create a subscriber

    ExecutorProvider executorProvider =
            InstantiatingExecutorProvider.newBuilder().setExecutorThreadCount(1).build();

    // create subscriber
    subscriber = Subscriber.newBuilder(subscriptionName, messageReceiver).setExecutorProvider(executorProvider).build();
    subscriber.startAsync();

Here is the demo: https://github.com/andonescu/play-pubsub

I've pushed 1000 messages, each process took 300 milliseconds (delay added intentionally) then ack() was called. The ack time on subscription is 10. Based on all these I should not receive duplicate messages, but I've received more than 10% of those sent.

here is the log: https://github.com/andonescu/play-pubsub/blob/master/reports/1000-messages-reader-status

I've added same question on https://github.com/GoogleCloudPlatform/pubsub/issues/182

1

1 Answers

1
votes

just looking very attentive through PubSub documentation and I've discovered the following part:

However, messages may sometimes be delivered out of order or more than once. In general, accommodating more-than-once delivery requires your subscriber to be idempotent when processing messages. You can achieve exactly once processing of Cloud Pub/Sub message streams using Cloud Dataflow PubsubIO. PubsubIO de-duplicates messages on custom message identifiers or those assigned by Cloud Pub/Sub.

https://cloud.google.com/pubsub/docs/subscriber#at-least-once-delivery

It seams that Cloud Dataflow PubsubIO is the key in my case.

or use an UniqueId and do the de-duplication in the client :)