To answer your question about why caching is working, even though the web-server didn't include the headers:
- Expires:
[a date]
- Cache-Control: max-age=
[seconds]
The server kindly asked any intermediate proxies to not cache the contents (i.e. the item should only be cached in a private cache, i.e. only on your own local machine):
But the server forgot to include any sort of caching hints:
- they forgot to include Expires, so the browser knows to use the cached copy until that date
- they forgot to include Max-Age, so the browser knows how long the cached item is good for
- they forgot to include E-Tag, so the browser can do a conditional request
But they did include a Last-Modified date in the response:
Last-Modified: Tue, 16 Oct 2012 03:13:38 GMT
Because the browser knows the date the file was modified, it can perform a conditional request. It will ask the server for the file, but instruct the server to only send the file if it has been modified since 2012/10/16 3:13:38:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
The server receives the request, realizes that the client has the most recent version already. Rather than sending the client 200 OK
, followed by the contents of the page, instead it tells you that your cached version is good:
304 Not Modified
Your browser did have to suffer the delay of sending a request to the server, and wait for a response, but it did save having to re-download the static content.
Why Max-Age? Why Expires?
Because Last-Modified sucks.
Not everything on the server has a date associated with it. If I'm building a page on the fly, there is no date associated with it - it's now. But I'm perfectly willing to let the user cache the homepage for 15 seconds:
200 OK
Cache-Control: max-age=15
If the user hammers F5, they'll keep getting the cached version for 15 seconds. If it's a corporate proxy, then all 67198 users hitting the same page in the same 15-second window will all get the same contents - all served from close cache. Performance win for everyone.
The virtue of adding Cache-Control: max-age
is that the browser doesn't even have to perform a conditional request.
- if you specified only
Last-Modified
, the browser has to perform a request If-Modified-Since
, and watch for a 304 Not Modified
response
- if you specified
max-age
, the browser won't even have to suffer the network round-trip; the content will come right out of the caches
The difference between "Cache-Control: max-age" and "Expires"
Expires
is a legacy equivalent of the modern (c. 1998) Cache-Control: max-age
header:
Expires
: you specify a date (yuck)
max-age
: you specify seconds (goodness)
And if both are specified, then the browser uses max-age
:
200 OK
Cache-Control: max-age=60
Expires: 20180403T192837
Any web-site written after 1998 should not use Expires
anymore, and instead use max-age
.
What is ETag?
ETag is similar to Last-Modified, except that it doesn't have to be a date - it just has to be a something.
If I'm pulling a list of products out of a database, the server can send the last rowversion
as an ETag, rather than a date:
200 OK
ETag: "247986"
My ETag can be the SHA1 hash of a static resource (e.g. image, js, css, font), or of the cached rendered page (i.e. this is what the Mozilla MDN wiki does; they hash the final markup):
200 OK
ETag: "33a64df551425fcc55e4d42a148795d9f25f89d4"
And exactly like in the case of a conditional request based on Last-Modified:
GET / HTTP/1.1
If-Modified-Since: Tue, 16 Oct 2012 03:13:38 GMT
304 Not Modified
I can perform a conditional request based on the ETag:
GET / HTTP/1.1
If-None-Match: "33a64df551425fcc55e4d42a148795d9f25f89d4"
304 Not Modified
An ETag
is superior to Last-Modified
because it works for things besides files, or things that have a notion of date. It just is