1
votes

This is kind of a core web development topic, and one that is tough to search for.

I'm running a medium sized websites ~2500 users a day. We are in heavy development of new features every month. We have GIT setup with a master, dev/master and some other development branches. We have a staging server and a dev server plus we all work locally till we push to dev.

However, when I push changes to dev, or even live, it's often the case that users have to refresh cache or else they see errors.

We do use HTML5 Application cache, which when we change the manifest will let all the files reset. But we're not using App Cache for the whole application, just some resources that make the application MUCH faster.

App Cache aside, this was still a problem in our old site, even without app cache. I know you can do ?timestamp after JS and CSS files. BUT I WANT users to cache these. It speeds up their experience.

So, how does one go about letting users cache content for speed, but get the NEWEST content when I push an update? How do the big boys handle this?

Thanks!

3
You should use the EXPIRE header to expire on the day you deliver. This link shows how to setup this header for external resouces (no php): webmasters.stackexchange.com/questions/5265/…Sebas
We generate all js and CSS files in a sub dir that reflects the version - so /1.2/styles/style1.css and /1.2/js/lib1.jsmplungjan
changing the headers every time I deploy means restarting NGINX every time I deploy. Which I don't think we can do, that will kick everyone off.Sean Clark

3 Answers

2
votes

I wanted to put a clear answer here, because the solution to this is AWESOME. This is basically what Kay was saying to do.

In PHP we do this define("GIT_HASH",exec(GIT." rev-parse --short HEAD 2>&1")); where GIT is a path to your git bin. On linux its just git, on mac its like /usr/local/bin/git

Then we put our GIT hash in JS to be used with require.js

<script>
    window.app_hash = '<?=GIT_HASH;?>' || '';
</script>

Now we have our hash, So we just changed the config for require.js

require.config({
baseUrl: '__' + app_hash,

We also have this for hardcoded PHP urls /css/main.css">

Lastly we used an NGINX rewrite rule to allow this

rewrite ^/__[^/]*/(.+) /$1 last;

And for apache in htaccess

RewriteRule ^__[^/]*/(.+) /$1

the __ is a prefix we used to our hash, just to make it clear. The last in NGINX is that the rest of our rules get hit, and for htaccess you don't need that.

The reason we didn't use a query string like ?whatever is because some browser will NEVER cache URLs with a query string like that. And we don't want that, we want caching, just not when we deploy.

If you aren't using require.js - you will have to change all of your URLs to the syntax, BUT ITS WORTH IT.

And if your using html5 app cache, be sure to take out any css and js you have in there.

Thanks Kay!

1
votes

From what I've read though, browser cache files differently when you add a querystring. So the solution I use is to have URLs look like this:

<script type="text/javascript" src="/resources/cacheholder1/js/site.js" />

Every time I build my project and are about to deploy the new version, I increment this number. Of course, that's very annoying when you have dozens/hundreds of these lines. So I wrote a bash script to go through my project and look for anything that matches the following pattern:

/resources/cacheholder(#)/

then take the matched number, increment it, and update/save the file.

Of course, it would probably be wiser to use the project number instead of an arbitrary number, as long as you are actually tracking the project number and it is automatically changed. This works for us right now, so I'm sticking with it, but have been planning to use the project number.

This is supposed to cache the files properly since it's a "new" URL, not just a querystring change. At the same time, it took me a little extra configuration to allow for this URL scheme because that "cacheholder" part changes (the number), so you can't hardcode the URL mapping in your project.

The problem with querystrings is that browsers are not supposed to cache requests with a querystring, so there is a mix between which do and don't. And I'm betting the only one that does (because I'm remembering it happen) is IE. Other browsers seem to follow the spec to not cache requests with a querystring.

0
votes

The query string in asset?timestamp causes Firefox to ask the server for every request. This is a waste of resources and user's time, even though you could respond with 304.

I use www.example.com/assets/<git hash>/name.js in my projects and it works fine. The revision only changes if the content was edited, so there won't be needless queries.

The static content is set to expire in 1 month:

Last-Modified: Thu, 28 Mar 2013 12:16:21 GMT
Cache-Control: public, max-age=2678400
Expires: Sun, 28 Apr 2013 14:00:58 GMT
ETag: "flask-1364472981.38-9149-1640239173"

In your deployment process you have to replace the <git hash> in your layout files.

You get the revision when an asset was last modified with:

git log --format=%h -1 -- path/to/asset.js