I need some advanced people to give me an advice is this is an Google CDN bug or i miss something. I discover this bug like 4 months ago, tried to contact their support, but they were so rude, that i don`t want to even speak here about that. They accepted, at least they told me that they will gonna send the problem to back-end team but after that they deleted the issue tracker and they dont response to my emails anymore. That is the main reason why i ask here.
Problem
Google CDN randomly not serving gzip content to end user. So instead of ~70KB they download 500KB files. I can not produce this problem directly to my origin, but i can produce this problem very easy on Google CDN.
Here is example request to CDN:
Request:
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate, sdch, br
Accept-Language:en-US,en;q=0.8,bg;q=0.6,hr;q=0.4,mk;q=0.2,sr;q=0.2
Cache-Control:no-cache
Connection:keep-alive
Cookie: example
Host: example.com
Pragma:no-cache
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
Response:
Accept-Ranges:bytes
Age:58422
Alt-Svc:clear
Cache-Control:public, max-age=604800
Content-Length:550158
Content-Type:text/css
Date:Tue, 04 Apr 2017 03:45:53 GMT
Expires:Tue, 11 Apr 2017 03:45:53 GMT
Last-Modified:Sun, 19 Mar 2017 01:50:22 GMT
Server:LiteSpeed
Via:1.1 google
As you can see, my request have accept-encoding:gzip header but i receive not gzip content. Instead of 70KB i receive 500KB. Also note Age header, that item is cached/exist on CDN for 58422 seconds!
Here is the same request from another machine (US)
Request:
:authority: xxx
:method:GET
:path:/wp-content/themes/365/style.css
:scheme:https
accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
accept-encoding:gzip, deflate, sdch, br
accept-language:en-US,en;q=0.8
cache-control:no-cache
cookie: xxx
pragma:no-cache
upgrade-insecure-requests:1
user-agent:Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Response:
accept-ranges:bytes
age:58106
alt-svc:clear
cache-control:public, max-age=604800
content-encoding:gzip
content-length:72146
content-type:text/css
date:Tue, 04 Apr 2017 03:49:28 GMT
expires:Tue, 11 Apr 2017 03:49:28 GMT
last-modified:Sun, 19 Mar 2017 01:50:22 GMT
server:LiteSpeed
status:200
vary:Accept-Encoding
via:1.1 google
As you can see, i got an gzip content from my other server.
I have a tons of HAR files and also videos that i prove this bug, but lets keep it simple. Google CDN logs are available in GCP dashboard, check how they looks like.
If all my visitors does not support gzip, what about GoogleBot?
I also analyzed my server logs and i found stats like 99% response size for that file is as gzip, only few requests are not gzip. Very logical, as some visitors or i prefer to say robots requested that file without gzip header.
Temporary solve the problem
If i purge the CDN cache, this problem does not exist in next minutes/hours. After some time, it still happens. Also this problem does not always happens, but randomly. I got system that parse CDN logs and show me graphs, that is actually how i discover this bug.
Whenever i see chart bandwidth increase (double as normal), when i log into google dashboard and check logs, i find those 500KB logs like 50% of that file requests, and it`s easy to produce the bug in browser, i just login to my servers, request the file and get random results.
Ill be so happy if the problem is in my origin since ill solve in 1 minute, but i think it is Google CDN bug. I`ll be happy if any person more into CDN technology to assist me or some person from Google Cloud.
EDIT:
As i said, this bug happens in random time frame, here is an video i recorded now that show us an 'NO BUG TIME FRAME'. As you can see, every response is compressed.
EDIT2:
Here is an graph that shows number of gzip and not gzip responses for an single .css url test.
EDIT3:
On first graph image, lines are stack-able, here is same graph without stacking. As you can see, some hours have near 100% not gzip responses.
EDIT4:
Here are my origin parsed logs for the same css file.
1060 requests were served with response size below 100KB. 200,304,206 response codes. 32 requests were served with response size above 100KB. 200 and 206 response codes.
EDIT5:
Analyzing 1-7 April logs here are some additional stats for single .css url:
19803 CDN requests were served with > 100KB (not gzip)
41004 CDN requests were served with < 100KB (gzip)
29 Cache Fill from origin with > 100KB (not gzip)
924 Cache Fill from origin with < 100KB (gzip)
423 Cache-To-Cache fill with > 100KB (not gzip)
2295 Cache-To-Cache fill with < 100KB (gzip)
I'm surprised how Cache-To-Cache fill is very effective, amazing.
SOLUTION
There is no bug in origin not even in Google CDN. Problem is when Google CDN receive an cache-able entity without 'Vary:Accept-Encoding' when request did not send 'Accept-Encoding:gzip', so Google CDN will store that uncompressed response and will overwrite all stored compressed cache entities. So next time when user try to get some file for example .css, Google CDN will answer like:
- I received this file from origin and its not vary by anything.
- Send uncompressed response.
Be aware that web servers are not configured to send 'Vary:Accept-Encoding' headers on requests that does not have 'Accept-Encoding:gzip' headers. I tested this on Litespeed, Apache, Nginx and Cloudflare Nginx.
I highly recommend Google team to update the documentation about this. There is an statement about 'Vary headers' but no one will get the point regarding this issue since not me, not Google first level support (i also had 20 days communication on Google issue tracker with two Google support persons), stack-overflow or other person answer the problem.
Additional the documentation says:
In addition to the request URI, Cloud CDN respects any Vary headers that instances include in responses.
But nothing when request does not have 'Vary' header.
This is how i fix it:
<FilesMatch '.(js|css|xml|gz|html|txt|xml|xsd|xsl|svg|svgz)$'>
Header merge Vary Accept-Encoding
</FilesMatch>