0
votes

I'm building a job queue by using Cloud Pub/Sub and I want to receive the messages in the order that the Pub/Sub service receives them. I created a topic and a subscription with message ordering enabled. I'm developing my system in Python with the google-cloud-pubsub package. As suggested in this doc, I must publish messages with ordering keys.

If messages have the same ordering key and you publish the messages to the same region, subscribers can receive the messages in order.

On the subscriber side, I need to process messages in batch, so I use the max_messages parameter to control. However, when I enable the message ordering option, for every time I can't pull max_messages messages as expected but only one message from the subscription. Strangely, when I disable the message ordering, it returns max_messages messages.

Publisher code:

...
topic_path = 'xxx'
ordering_key = '202011240000'
while True:
    job = {'job_id': 'xxxxxx', 'foo': 0, 'bar': 0}
    data = json.dumps(job, default=str).encode('utf-8')
    publisher.publish(topic_path, data=data, ordering_key=ordering_key)
    time.sleep(1)

Subscriber code:

...
subscription_path = 'xxx'
subscriber.pull(request={'subscription': subscription_path, 'max_messages': 300})
...

I did something wrong or Pub/Sub is designed as this?

1

1 Answers

1
votes

The max_messages property does not mean that the server will be guaranteed to return that number of messages, even if they are available. With ordered delivery, it is even less likely that batches of messages returned to individual pull requests will contain the maximum number of messages as there is more coordination that has to be done to ensure messages are being sent in order, particularly if you are using a single ordering key. The server attempts to not hold requests waiting for more messages to send for too long because otherwise the end-to-end latency could get much harder.

There are two ways to deal with this. The first is to switch to the Cloud Pub/Sub client library, which uses streaming pull and therefore is in a much better position to deliver messages as soon as they are available because there is a persistent connection to which to deliver the messages.

The second is to ensure you have a lot of pulls outstanding simultaneously. Note that this won't help with the case of a single ordering key as only one list of messages for an ordering key can be outstanding at a time. If you have many ordering keys, this will probably help.

For more information on the delivery semantics, see the "Receiving Messages in Order" section of the ordering keys Medium post.