Drupal, mod_cache & RFC2616 caching

Suppose you’re setting up a Drupal-based site for which you have to implement a caching reverse proxy and for reasons beyond your comprehension Varnish (or even Squid) are not an option. Oh no, you’re stuck with Apache’s mod_proxy and mod_cache! What should you do?
First of all, Drupal 6 doesn’t like reverse proxies. If you don’t want to wait for version 7, which should do better in this respect, you might want to look at Pressflow. This Drupal 6 “distro” has everything on board to work with reverse proxies. So install Pressflow (or try to apply this out of date diff to stock Drupal) and in the Performance-screen set “Caching Mode” to “External” and “Page Cache Maximum Age” to the number of minutes you consider a cached page valid. Voila, you’re done in Drupal (edit: almost, as you might also want to change the $base_url in sites/default/settings.php to reverse proxy URL after you configured Apache).
Next up: Apache! A simple configuration like this one should do the trick:

ProxyRequests Off
ProxyPass /rp_drupal http://localhost/pressflow
ProxyPassReverse /rp_drupal http://localhost/pressflow
CacheEnable disk /rp_drupal/
CacheRoot c:/TEMP/apacache
CacheDefaultExpire 3600

OK, this must surely work, no? Well it should, but it doesn’t! When setting your Apache-loglevel to debug you’ll see “not cached” entries in your error-log, with the following reason:

Expires header already expired, not cacheable

Expires in the past, what does Pressflow think it’s doing deep down in includes/bootstrap.inc?

// HTTP/1.0 proxies do not support the Vary header, so prevent any caching
// by sending an Expires date in the past. HTTP/1.1 clients ignores the
// Expires header if a Cache-Control: max-age= directive is specified (see RFC
// 2616, section 14.9.3).
drupal_set_header('Expires', 'Sun, 11 Mar 1984 12:00:00 GMT');
// [...]
$max_age = variable_get('cache', CACHE_DISABLED) == CACHE_AGGRESSIVE && (!isset($_COOKIE[session_name()]) || isset($hook_boot_headers['vary'])) ? variable_get('page_cache_max_age', 0) : 0;
$default_headers['Cache-Control'] = 'public, max-age=' . $max_age;

Darn, those Pressflow-guys seem to have read up on their RFC’s! And indeed, 2616 confirms that cache-control’s max-age overrules expires;

If a response includes both an Expires header and a max-age directive, the max-age directive overrides the Expires header, even if the Expires header is more restrictive. This rule allows an origin server to provide, for a given response, a longer expiration time to an HTTP/1.1 (or later) cache than to an HTTP/1.0 cache.

Mod_cache’s code seems to take a much simpler approach; at line 503 it decides not to cache based on an Expires-header in the past, totally dismissing the potential presence of cache-control’s max-age.

else if (exp != APR_DATE_BAD && exp < r->request_time)
    {
        /* if a Expires header is in the past, don't cache it */
        reason = "Expires header already expired, not cacheable";
    }

But you’re not interested in code which does or does not adhere to whatever RFC some spec-buffs came up with, you just want to cache your frigging’ Drupal-site! Well, fear not little hacker-boy, here’s some Apache-magic to cure your ailments, to be copy/pasted in the config before ProxyPass and ProxyPassReverse:

<Location /rp_drupal>
     SetEnvIf Request_Protocol "HTTP/1.1" expires_overrule
     # homework: add a SetEnvIf to see if cache-control max-age is present
     Header unset Expires env=expires_overrule
</Location>

So there you have it, a rudimentary caching setup for Drupal (in the guise of Pressflow) using nothing but Apache’s mod_proxy and mod_cache. Now go do your homework and test and do some finetuning and test some more. Happy caching!