1
votes

We’re in process of designing caching strategy for a heavily used web-site. The site consists of a mix of dynamic and static content. The front-end is PHP, middle tier is Tomcat and mysql on the back.

Only user login screen is done over HTTPS to secure the credentials. After that, all content is served over plain HTTP. Some of the screens are specific to the customer (let’s say his last orders), while other screens are common to everybody (most popular products, promotions, rules, etc).

Given the expected traffic volume it’s clear that we need a comprehensive caching strategy. So we’re considering following options:

  1. Put Squid or Varnish in front of PHP and configure it to cache all public content and even order submission form of a customer.
  2. Use memcached by PHP to cache page fragments (such as most popular products)
  3. Implement caching in the middle tier/tomcats (i.e. before returning content to web-servers, try to fetch it from local cache such as ehcache)
  4. Use PHP-level cache like Zend Cache and store there fragments of the pages. This is close to the second option that i mentioned but it's built into the Zend framework.

It’s possible that we will use a combination of those strategies.

So the question is whether it's worthwhile to add front cache like Varnish, or just use Zend Cache inside?


The other option that i forgot to mention is to use PHP-level cache like Zend Cache and store there fragments of the pages. This is close to the second option that i mentioned but it's built into the Zend framework.

So the question is whether it's worthwhile to add front cache like Varnish, or just use Zend Cache inside?

Thanks again, Philopator.

1
Whichever strategy you go for, just bear in mind that having Varnish cache pages means you can get pageloads off PHP entirely, which in itself can make a big difference to how much traffic you can handle.ZoFreX

1 Answers

1
votes

I've done quite a few projects like this and found that:

  • creating a (complete) custom solution is hard and expensive. Luckily you found Squid/Varnish, memcache and ehcache
  • The dynamic behaviour of sites differ a lot and you know your site best, so it makes sense to devise a specific caching strategy
  • it makes sense to deploy multiple layers of cache. However, this will complicate the behavior of your site, so you should tell everybody involved with the site (e.g. business) something about it and tell your engineers a lot about it.
  • Think of how you're going to debug problems. e.g. add headers that indicate the freshness of the data served, allow certain people to purge or avoid the cache
  • Regularly check how the different cache layers perform (e.g. use nagios plugins for your varnish machines).
  • Measure where your performance problems are before you build any caches :)
  • caching certain objects for just a short while can already be a very significant improvement

These days I like Varnish a lot: it's a separate layer that doesn't clutter the Java/PHP code, it's fast and very flexible. Downside is that the configuration in vcl is a bit too complex.

I typically use ehcache + in memory storage to avoid latency (e.g. database queries or service requests) with small data sets, and memcached when there's a lot of data and the cache needs to shared by multiple nodes.