0
votes

I have a bit of a two-part question regarding the nature of metadata update notifications in GCS. // For the mods: if I should split this into two, let me know and I will.

I have a bucket in Google Cloud Storage, with Pub/Sub notifications configured for object metadata changes. I routinely get doubled metadata updates, seemingly out of nowhere. What happens is that at one point, a Cloud Run container reads the object designated by the notification and does some things that result in
a) a new file being added.
b) an email being sent.
And this should be the end of it.

However, app. 10 minutes later, a second notification fires for the same object, with the metageneration incremented but no actual changes being evident in the notification object.
Strangely, the ETag seems to change minimally (CJ+2tfvk+egCEG0 -> CJ+2tfvk+egCEG4), but the CRC32C and MD5 checksums remain the same - this is correct in the sense that the object is not being written.

The question is twofold, then:
- What exactly constitutes an increment in the metageneration attribute, when no metadata is being set/updated?
- How can the ETag change if the underlying data does not, as shown by the checksums (I guess the documentation does say "that they will change whenever the underlying data changes"[1], which does not strictly mean they cannot change otherwise).


1: https://cloud.google.com/storage/docs/hashes-etags#_ETags

1
If the metageneration number increases, the most likely cause is an explicit call from somewhere to update the metadata in some fashion (possibly in the ACLs or somewhere not obvious). Consider enabling Stackdriver or bucket access logs to verify that no extra update call is coming in from somewhere.Brandon Yarbrough
To your second question as you mentioned the complete documentation quote is users should make no assumptions about those ETags except that they will change whenever the underlying data changes, so, indeed, you cannot assume that the ETag will not change, since this is not guaranteed.Rafael Lemos

1 Answers

0
votes

As commented by @Brandon Yarbrough If the metageneration number increases, the most likely cause is an explicit call from somewhere unexpected to update the metadata in some fashion, and a way to verify that no extra update calls are being executed is by enabling Stackdriver or bucket access logs.

Regarding the ETag changes, the ETag documentation on Cloud Storage states that

Users should make no assumptions about those ETags except that they will change whenever the underlying data changes.

This indicates that the only scenario that is guaranteed that the ETag will be changed is on the data change, however, other events may trigger an ETag change as well, so you should not use ETags as a reference for file changes.