11
votes

I'm writing a resource handling method where I control access to various files, and I'd like to be able to make use of the browser's cache. My question is two-fold:

  1. Which are the definitive HTTP headers that I need to check in order to know for sure whether I should send a 304 response, and what am I looking for when I do check them?

  2. Additionally, are there any headers that I need to send when I initially send the file (like 'Last-Modified') as a 200 response?

Some psuedo-code would probably be the most useful answer.


What about the cache-control header? Can the various possible values of that affect what you send to the client (namely max-age) or should only if-modified-since be obeyed?

5
I would just like to add then when sending a 304 response, you should only send the header and not the content.GateKiller

5 Answers

8
votes

Here's how I implemented it. The code has been working for a bit more than a year and with multiple browsers, so I think it's pretty reliable. This is based on RFC 2616 and by observing what and when the various browsers were sending.

Here's the pseudocode:

server_etag = gen_etag_for_this_file(myfile)
etag_from_browser = get_header("Etag")

if etag_from_browser does not exist:
    etag_from_browser = get_header("If-None-Match")
if the browser has quoted the etag:
    strip the quotes (e.g. "foo" --> foo)

set server_etag into http header

if etag_from_browser matches server_etag
    send 304 return code to browser

Here's a snippet of my server logic that handles this.

/* the client should set either Etag or If-None-Match */
/* some clients quote the parm, strip quotes if so    */
mketag(etag, &sb);

etagin = apr_table_get(r->headers_in, "Etag");
if (etagin == NULL)
    etagin = apr_table_get(r->headers_in, "If-None-Match");
if (etag != NULL && etag[0] == '"') {
    int sl; 
    sl = strlen(etag);
    memmove(etag, etag+1, sl+1);
    etag[sl-2] = 0;
    logit(2,"etag=:%s:",etag);
}   
... 
apr_table_add(r->headers_out, "ETag", etag);
... 
if (etagin != NULL && strcmp(etagin, etag) == 0) {
    /* if the etag matches, we return a 304 */
    rc = HTTP_NOT_MODIFIED;
}   

If you want some help with etag generation post another question and I'll dig out some code that does that as well. HTH!

4
votes

A 304 Not Modified response can result from a GET or HEAD request with either an If-Modified-Since ("IMS") or an If-Not-Match ("INM") header.

In order to decide what to do when you receive these headers, imagine that you are handling the GET request without these conditional headers. Determine what the values of your ETag and Last-Modified headers would be in that response and use them to make the decision. Hopefully you have built your system such that determining this is less costly than constructing the complete response.

If there is an INM and the value of that header is the same as the value you would place in the ETag, then respond with 304.

If there is an IMS and the date value in that header is later than the one you would place in the Last-Modified, then respond with 304.

Else, proceed as though the request did not contain those headers.

For a least-effort approach to part 2 of your question, figure out which of the (Expires, ETag, and Last-Modified) headers you can easily and correctly produce in your Web application.

For suggested reading material:

http://www.w3.org/Protocols/rfc2616/rfc2616.html

http://www.mnot.net/cache_docs/

3
votes

You should send a 304 if the client has explicitly stated that it may already have the page in its cache. This is called a conditional GET, which should include the if-modified-since header in the request.

Basically, this request header contains a date from which the client claims to have a cached copy. You should check if content has changed after this date and send a 304 if it hasn't.

See http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25 for the related section in the RFC.

2
votes

We are also handling cached, but secured, resources.  If you send / generate an ETAg header (which RFC 2616 section 13.3 recommends you SHOULD), then the client MUST use it in a conditional request (typically in an If-None-Match - HTTP_IF_NONE_MATCH - header).  If you send a Last-Modified header (again you SHOULD), then you should check the If-Modified-Since - HTTP_IF_MODIFIED_SINCE - header.  If you send both, then the client SHOULD send both, but it MUST send the ETag.  Also note that validtion is just defined as checking the conditional headers for strict equality against the ones you would send out.  Also, only a strong validator (such as an ETag) will be used for ranged requests (where only part of a resource is requested).

In practice, since the resources we are protecting are fairly static, and a one second lag time is acceptable, we are doing the following:

  1.  Check to see if the user is authorized to access the requested resource

         If they are not, Redirect them or send a 4xx response as appropriate.  We will generate 404 responses to requests that look like hack attempts or blatant tries to perform a security end run.

  2.  Compare the If-Modified-Since header to the Last-Modified header we would send (see below) for strict equality

         If they match, send a 304 Not Modified response and exit page processing

  3.  Create a Last-Modified header using the modification time of the requested resource

        Look up the HTTP Date format in RFC 2616

  4.  Send out the header and resource content along with an appropriate Content-Type

We decided to eschew the ETag header since it is overkill for our purposes.  I suppose we could also just use the date time stamp as an ETag.  If we move to a true ETag system, we would probably store computed hashes for the resources and use those as ETags.

If your resources are dynamically generated, from say database content, then ETags may be better for your needs, since they are just text to be populated as you see fit.

1
votes

regarding cache-control:

You shouldn't have to worry about the cache-control when serving out, other than setting it to a reasonable value. It's basically telling the browser and other downstream entities (such as a proxy) the maximum time that should elapse before timing out the cache.