3
votes

Some months ago activated Cloud CDN for storage buckets. Our storage data is regularly changed via a backend. So to invalidate the cached version we added a query param with the changedDate to the url that is served to the client.

Back then this worked well.

Sometime in the last months (probably weeks) Google seemed to change that and is now ignoring the query string for caching from storage buckets.

  • First part: Does anyone know why this is changed and why noone was notified about it?
  • Second part: How can you invalidate the Cache for a particular object in a storage bucket without sending a cache-invalidation request (which you shouldn't) everytime?

I don't like the idea of deleting the old file and uploading a new file with changed filename everytime something is uploaded...

EDIT: for clarification: the official docu ( cloud.google.com/cdn/docs/caching ) already states that they now ignore query strings for storage buckets:

For backend buckets, the cache key consists of the URI without the query > string. Thus https://example.com/images/cat.jpg, https://example.com/images/cat.jpg?user=user1, and https://example.com/images/cat.jpg?user=user2 are equivalent.

3
What do you have set for the CDN cache key? Edit your question with the CDN configuration. This document might help you: cloud.google.com/cdn/docs/cachingJohn Hanley
Thats exactly the point: they changed it so that you can't set it for storage buckets. says so in the document if you scroll down.Markus Zancolò

3 Answers

3
votes

We were affected by this also. After contacting Google Support, they have confirmed this is a permanent change. The recommended work around is to either use versioning in the object name, or use cache invalidation. The latter sounds a bit odd as the cache invalidation documentation states:

Invalidation is intended for use in exceptional circumstances, not as part of your normal workflow.

0
votes

For backend buckets, the cache key consists of the URI without the query string, as the official documentation states.1 The bucket is not evaluating the query string but the CDN should still do that. I could reproduce this same scenario and currently is still possible to use a query string as cache buster.

Seems like the reason for the change is that the old behavior resulted in lost caching opportunities, higher costs and higher latency. The only recommended workaround for now is to create the new objects by incorporating the version into the object's name (which seems is not valid options for your case), or using cache invalidation.

Invalidating the cache for a particular object will require to use a particular query. Maybe a Cache-Control header allowing such objects to be cached for a certain time may be your workaround. Cloud CDN cache has an expiration time defined by the "Cache-Control: s-maxage", "Cache-Control: max-age", and/or Expires headers 2.


0
votes

According to the doc, when using backend bucket as origin for Cloud CDN, query strings in the request URL are not included in the cache key:

For backend buckets, the cache key consists of the URI without the protocol, host, or query string.

Maybe using the query string to identify different versions of cached content is not the best practices promoted by GCP. But for some legacy issues, it has to be.

So, one way to workaround this is make backend bucket to be a static website (do NOT enable CDN here), then use custom origins (Cloud CDN backed by Internet network endpoint groups backend service) which points to that static website.

For backend service, query string IS part of cache key.

For backend services, Cloud CDN defaults to using the complete request URI as the cache key

That's it. Yes, It is tedious but works!