Fun with caching in PHP with APC (and others)

After installing APC, I looked through the documentation on php.net and noticed 3 interesting functions with regards to session-independent data caching in PHP;

When talking about caching, apc_delete might not be that important, as apc_store allows you to set the TTL (time to live) of the variable you’re storing. If you try to retrieve a stored variable which exceeded the TTL, APC will return FALSE, which tells you to update your cache.
All this means that adding 5 minutes worth of caching to your application could be as simple as doing;

if (($stringValue=apc_fetch($stringKey)) === FALSE) {
$stringValue = yourNormalDogSlowFunctionToGetValue($stringKey);
apc_store($stringKey,$stringValue,300);
}

From a security point-of-view however (esp. on a shared environment) the APC-functions should be considered extremely dangerous. There are no mechanisms to prevent a denial of service; everyone who “does PHP” on a server can fill the APC-cache entirely. Worse yet, using apc_cache_info you can get a list of all keys which you in turn can use to retrieve all associated values, meaning data theft can be an issue as well. But if you’re on a server of your own (and if you trust all php-scripts you install on there), the APC-functions can be sheer bliss!
And off course other opcode caching components such as XCache and eAccelerator offer similar functionality (although it’s disabled by default in eAccelerator because of the security concerns).

Cache header magic (or how I learned to love http response headers)

Is your site dead slow? Does it use excessive bandwidth? Let me take you on a short journey to the place where you can lessen your worries: the http-response-headers, where some wee small lines of text can define how your site is loaded in cached.
So you’re interested and you are going to read on? In that case let me skip the foolishness (I’m too tired already) and move on to the real stuff. There are 3 types of caching-directives that can be put in the http response: permission-, expiry- and validation-related headers:

  1. Permission-related http response headers tell the caching algorithm if an object can be kept in cache at all. The primary way to do this (in HTTP/1.1-land) is by using cache-control:public, cache-control:private or cache-control:nocache. Nocache should be obvious, private indicates individual browser caches can keep a copy but shared caches (mainly proxies) cannot and public allows all caches to keep the object. Pre-http/1.1 this is mainly done by issuing a Last-Modified-date (Last-Modified; although that in theory is a validation-related cache directive) which is set in the future.
  2. The aim of expiry-related directives is to tell the caching mechanism (in a browser or e.g. a proxy) that upon reload the object can be reused without reconnecting to the originating server. These directives can thus help you avoid network roundtrips while your page is being reloaded. The following expiry-related directives exist: expires, cache-control:max-age, and cache-control:s-maxage. Expires sets a date/time (in GMT) at which the object in cache expires and will have to be revalidated. Cache-control:Max-age and Cache-control:s-maxage (both of which which take precedence if used in conjunction with expires) define how old an object may get in cache (using either the ‘age’ http response header or calculating the age using (Expires – Current date/time). s-maxage is to be used by shared caches (and takes precedence over max-age there), whereas max-age is used by all caches (you could use this to e.g. allow a browser to cache a personalised page, but prohibit a proxy from doing so). If neither expires, cache-control:max-age or cache-control:s-maxage are defined, the caching mechanism will be allowed to make an estimate (this is called “heuristic expiration“) of the time an object can remain in cache, based on the Last-Modified-header (the true workhorse of caching in http/1.0).
  3. Validation-related directives give the browser (or caching proxy) a means by which an object can be (re-)validated, allowing for conditional requests to be made to the server and thus limiting bandwith usage. Response-headers used in this respect are principally Last-Modified (date/timestamp the object was … indeed modified the last time) and ETag (which should be a unique string for each object, only changing if the object got changed).

And there you have it, those are the basics. So what should you do next? Perform a small functional analysis of how you want your site (html, images, css, js, …) to be cached at proxy or browser-level. Based on that tweak settings of your webserver (for static files served from the filesystem, mostly images, css, js) to allow for caching. The application that spits out html should include the correct headers for your pages so these can be cached as well (if you want this to happen, off course). And always keep in mind that no matter how good you analyze your caching-needs and how well you set everything up, it all depends on the http-standards (be it HTTP/1.0 or 1.1) the caching applications follows (so you probably want to include directives for both HTTP/1.0 and 1.1) and how well they adhere to those standards … Happy caching!
Sources/ read on:

(Disclaimer: there might well be some errors in here, feel free to correct me if I missed something!)