Get the size of a single message in Google Cloud PubSub

Question

I have a setup where I am publishing messages to Google Cloud PubSub service.

I wish to get the size of each individual message that I am publishing to PubSub. So for this, I identified the following approaches (Note: I am using the Python clients for publishing and subscribing, following a line-by-line implementation as presented in their documentation):

View the message count from the Google Cloud Console using the 'Monitoring' feature
Create a pull subscription client and view the size using message.size in the callback function for the messages that are being pulled from the requested topic.
Estimate the size of the messages before publishing by converting them to JSON as per the PubSub message schema and using sys.getsizeof()

For a sample message like as follows which I published using a Python publisher client:

{
  "data": 'Test_message',
  "attributes": {
    'dummyField1': 'dummyFieldValue1',
    'dummyField2': 'dummyFieldValue2'
  }
}

, I get the size as 101 as the message.size output from the following callback function in the subcription client:

def callback(message):
    print(f"Received {message.data}.")
    if message.attributes:
        print("Attributes:")
        for key in message.attributes:
            value = message.attributes.get(key)
            print(f"{key}: {value}")
    print(message.size)
    message.ack()

Whereas the size displayed on Cloud Console Monitoring is something around 79 B.

So these are my questions:

Why are the sizes different for the same message?
Is the output of message.size in bytes?
How do I view the size of a message before publishing using the python client?
How do I view the size of a single message on the Cloud Console, rather than a aggregated measure of size during a given timeframe which I could find in the Monitoring section?

According to the documentation, the message.size is an attribute that Return the size of the underlying message, in bytes. Regarding your question about the value of message_sizes this metric means the Distribution of publish message sizes (in bytes). It is Sampled every 60 seconds. After sampling, data is not visible for up to 240 seconds, link. Could you tell me the reason you want to check the message size before publishinhg? — Alexandre Moraes
Also, would message.size and 'message_sizes` (as mentioned above) satisfy your needs? — Alexandre Moraes
@AlexandreMoraes I wish to know the size of messages that are being published to have an estimate of the dataflow if messages are being published at a specified rate for a specified number of days. This is in turn to estimate how much it would cost, and whether it would stay within the free tier. — Ishwar Venugopal
According to the Python Library documentation you only have the message.size as a message attribute for the subscriber. Otherwise, you will have to use Cloud Monitoring and alerts, which is very useful if you want to monitor your quota expenditure. Did all this information help you? — Alexandre Moraes

Alexandre Moraes Alexandre Moraes · Accepted Answer · 2021-02-17T08:58:25

In order to further contribute to the community, I am summarising our discussion as an answer.

Regarding message.size, it is an attribute from a message in the subscriber client. In addition, according to the documentation, its definition is:

Returns the size of the underlying message, in bytes

Thus you would not be able to use it before publishing.

On the opposite side, message_size is a metric in Google Cloud Metrics and it is used by Cloud Monitoring, here.

Finally, the last topic discussed was that your aim is to monitor your quota expenditure, so you can stay in the free tier. For this reason, the best option would be using Cloud Monitoring and setup alerts based on the metrics such as pubsub.googleapis.com/topic/byte_cost. Here are some links, where you can find more about it: Quota utilisation, Alert event based, Alert Policies.

Get the size of a single message in Google Cloud PubSub

2 Answers