I am looking into ways to order list of messages from google cloud pub/sub. The documentation says:
Have a way to determine from all messages it has currently received whether or not there are messages it has not yet received that it needs to process first.
...is possible by using Cloud Monitoring to keep track of the
pubsub.googleapis.com/subscription/oldest_unacked_message_age
metric. A subscriber would temporarily put all messages in some persistent storage and ack the messages. It would periodically check the oldest unacked message age and check against the publish timestamps of the messages in storage. All messages published before the oldest unacked message are guaranteed to have been received, so those messages can be removed from persistent storage and processed in order.
I tested it locally and this approach seems to be working fine.
I have one gripe with it however, and this is not something easily testable by myself.
This solution relies on server-side assigned (by google) publish_time
attribute. How does Google avoid the issues of skewed clocks?
If my producer publishes messages A and then immediately B, how can I be sure that A.publish_time < B.publish_time
is true? Especially considering that the same documentation page mentions internal load-balancers in the architecture of the solution. Is Google Pub/Sub using atomic clocks to synchronize time on the very first machines which see messages and enrich those messages with the current time?
There is an implicit assumption in the recommended solution that the clocks on all the servers are synchronized. But the documentation never explains if that is true or how it is achieved so I feel a bit uneasy about the solution. Does it work under very high load?
Notice I am only interested in relative order of confirmed messages published after each other. If two messages are published simultaneously, I don't care about the order of them between each other. It can be A, B
or B, A
. I only want to make sure that if B is published after A is published, then I can sort them in that order on retrieval.
Is the aforementioned solution only "best-effort" or are there actual guarantees about this behavior?