1
votes

I would like to count the number of messages in the last hour (last hour referring to a timestamp field in the message data).

I currently have a code that will count the messages synchronously (I am using Google Cloud Pub/Sub Synchronous pull), but I noticed it will take quite long.

My code will repeatedly poll the subscription for a predefined (I set it to 100+) number of times so that I am sure there are no more messages in the last hour that are coming in out of order.
This is not an acceptable design because it means the user has to wait for 5-10 mins for the service to count the messages when they want the metric!

Are there best practices in Pub Sub design for solving this kind of problem?
This seems like a simple problem to solve (count the number of events in the last X timeframe) so I thought there might be.

Will asynchronous design help? How would an async design work? I am not too sure about the async and Python future concept (I am using GCP Pub/Sub's Python client library).

1

1 Answers

2
votes

I will try to catch the message differently. My solution is based on logging and BigQuery. The idea is to write a log, for example message received with timestamp xxxxx, to filter this log pattern and to sink the result in BigQuery.

Then, when a user ask, you simply have to request BigQuery and to count the message in the desired lap of time. You also have the advantage to change the time frame, to have an history,...

For writing this log, 2 solutions

  • Cheaper but not really recommended, the process which consume the message log it with it process it. However, you are dependent of an external service. And this service has 2 responsibilities: its work, and this log (for metrics). Not SOLID. Maybe it's can be the role of the publisher with a loge like this: message published at XXXX. However this imply that all the publisher or all the subscribers are on GCP.
  • Better is to plug a function, the cheaper (128Mb of memory) to simply handle the message and write the log.