Drupal, mod_cache & RFC2616 caching

Suppose you’re setting up a Drupal-based site for which you have to implement a caching reverse proxy and for reasons beyond your comprehension Varnish (or even Squid) are not an option. Oh no, you’re stuck with Apache’s mod_proxy and mod_cache! What should you do?
First of all, Drupal 6 doesn’t like reverse proxies. If you don’t want to wait for version 7, which should do better in this respect, you might want to look at Pressflow. This Drupal 6 “distro” has everything on board to work with reverse proxies. So install Pressflow (or try to apply this out of date diff to stock Drupal) and in the Performance-screen set “Caching Mode” to “External” and “Page Cache Maximum Age” to the number of minutes you consider a cached page valid. Voila, you’re done in Drupal (edit: almost, as you might also want to change the $base_url in sites/default/settings.php to reverse proxy URL after you configured Apache).
Next up: Apache! A simple configuration like this one should do the trick:

ProxyRequests Off
ProxyPass /rp_drupal http://localhost/pressflow
ProxyPassReverse /rp_drupal http://localhost/pressflow
CacheEnable disk /rp_drupal/
CacheRoot c:/TEMP/apacache
CacheDefaultExpire 3600

OK, this must surely work, no? Well it should, but it doesn’t! When setting your Apache-loglevel to debug you’ll see “not cached” entries in your error-log, with the following reason:

Expires header already expired, not cacheable

Expires in the past, what does Pressflow think it’s doing deep down in includes/bootstrap.inc?

// HTTP/1.0 proxies do not support the Vary header, so prevent any caching
// by sending an Expires date in the past. HTTP/1.1 clients ignores the
// Expires header if a Cache-Control: max-age= directive is specified (see RFC
// 2616, section 14.9.3).
drupal_set_header('Expires', 'Sun, 11 Mar 1984 12:00:00 GMT');
// [...]
$max_age = variable_get('cache', CACHE_DISABLED) == CACHE_AGGRESSIVE && (!isset($_COOKIE[session_name()]) || isset($hook_boot_headers['vary'])) ? variable_get('page_cache_max_age', 0) : 0;
$default_headers['Cache-Control'] = 'public, max-age=' . $max_age;

Darn, those Pressflow-guys seem to have read up on their RFC’s! And indeed, 2616 confirms that cache-control’s max-age overrules expires;

If a response includes both an Expires header and a max-age directive, the max-age directive overrides the Expires header, even if the Expires header is more restrictive. This rule allows an origin server to provide, for a given response, a longer expiration time to an HTTP/1.1 (or later) cache than to an HTTP/1.0 cache.

Mod_cache’s code seems to take a much simpler approach; at line 503 it decides not to cache based on an Expires-header in the past, totally dismissing the potential presence of cache-control’s max-age.

else if (exp != APR_DATE_BAD && exp < r->request_time)
    {
        /* if a Expires header is in the past, don't cache it */
        reason = "Expires header already expired, not cacheable";
    }

But you’re not interested in code which does or does not adhere to whatever RFC some spec-buffs came up with, you just want to cache your frigging’ Drupal-site! Well, fear not little hacker-boy, here’s some Apache-magic to cure your ailments, to be copy/pasted in the config before ProxyPass and ProxyPassReverse:

<Location /rp_drupal>
     SetEnvIf Request_Protocol "HTTP/1.1" expires_overrule
     # homework: add a SetEnvIf to see if cache-control max-age is present
     Header unset Expires env=expires_overrule
</Location>

So there you have it, a rudimentary caching setup for Drupal (in the guise of Pressflow) using nothing but Apache’s mod_proxy and mod_cache. Now go do your homework and test and do some finetuning and test some more. Happy caching!

Why I dislike Facebook’s Like widgets

I like Facebook. I like sharing stuff there, I like liking friends’ activities and I like friends sharing and liking my links and posts. But I really, really don’t like Facebook’s Like buttons and similar boxes! Because I see some serious problems with the like button;

  1. The page containing the “like”-widget loads and renders significantly slower (i.e. performance impact)
  2. Facebook can track me visiting this page, even if I don’t click on “Like” (i.e. privacy issue)
  3. When I do click “Like”, I have no way of checking what will be shown on Facebook. And indeed the buttons are already being used to spread spam, malware is expected to be next (i.e. security risk)
  4. “Liking” a page enters me into a relationship with the page owner, allowing them to “publish updates to the user [and] target ads to people who like [their] content” (i.e. 2nd privacy issue, severely aggravated by the security risk)

No, call me old-fashioned, but I’m much more at ease with the normal Facebook share-mechanism;

  • a simple link, so no performance impact
  • no contact with Facebook unless clicked on, so tracking of my surfing behavior is not possible
  • an intermediate screen shows what you’re about to share, meaning a much lower security risk
  • no forced relationship with the  page owner, i.e. “avert 2nd privacy-risk: CHECK”

But as I can’t force site-owners to remove the “Social Widgets”, I can only install something like No FB Tracking to disable the virus that is the Facebook Like-button. And whine about it on my blog, off course.

Lite YouTube Embeds in WordPress

This 3rd episode in the “High performance YouTube embeds” series brings you yet another way to use LYTE instead of normal YouTube embeds: wp-youtube-lyte. This WordPress-plugin will automatically replace YouTube-links that start with “httpv://” with Lite YouTube Embeds, thereby significantly reducing download size & rendering time.
wp-youtube-lyte plays nice with the great “Smart Youtube” plugin, in which case it will take care of the default embeds (httpv), while Smart Youtube will parse the other types (httpvh, httpvhd, httpvp, …).
You can download the plugin from http://futtta.be/lyte/wp-youtube-lyte.zip.
A quick demo maybe, to finish things off? Owen Pallett performing “Lewis takes action” live in the KCRW studios:

Owen Pallett - Lewis Takes Action

High performance YouTube embeds

It’s all about speed! I mean, you want your visitors to stick around, enjoying your content instead of waiting for stuff to download, no? And I bet you would love Google to consider your site quick, now that speed has been confirmed to have an impact on search ranking?
“High Performance web sites”-guru Steve Souders is doing incredible work studying the impact of 3rd party content -and especially javascript-based services such as analytics and bannering- on the performance of a web site.
An entirely different type of 3rd party content is Flash video, especially YouTube embeds, but there are a number of indicators that these indeed do impact performance as well:

  • opening a page with embedded YouTube can take some time when on dialup (or mobile data)
  • YouTube-heavy pages tend to slow down older computers (flashblock or adblock anyone?)
  • Facebook doesn’t really embed YouTube by default, but uses a placeholder with a thumbnail instead that is replaced by the embedded YouTube only when clicked
  • Google Webmaster Tools “Site Performance” seems to sometimes single out pages with YouTube embeds (e.g. stating that there is a DNS-resolution overhead)

Indeed, when testing this simple page with some text and 2 embedded YouTube clips on webpagetest.org, these were the main results (full results including nice graphs here):

  • (base) download complete: 0.356s for 2KB
  • start render: 0.426s
  • full page download complete: 3.005s for 315KB

There’s good and bad news in those figures. As could be expected the YouTube Flash embed doesn’t impact the rendering of the base page. But 2.6 seconds and 312KB just to display 2 video’s a visitor might not even bother to look at (I bet that the click-rate for embedded YouTube video is somewhere between 2 and 20%), that’s … sub-optimal?
So I threw some JavaScript at my computer to build an alternative to the default YouTube embed, the main goal being to build a Flash-less initial view with only a few lines of html/javascript which at some point people could copy/paste in their site just like they do now. And LYTE (Lite YouTube Embed) came into the world.
The main results when testing this LYTE-test-page on webpagetest.org (full results here):

  • (base) download complete: 0.324s for 4KB (which is marginally faster)
  • start render: 0.363s (again marginally faster)
  • full page download complete: 0.803s for 35KB (leaner, meaner and faster!)

The code that would have to be copy/pasted (multi-line for clarity):
<div class="lyte" id="gnDh6PqWqD8" style="width:480;height:385;"><noscript><a href="http://youtu.be/gnDh6PqWqD8">Watch on YouTube</a></noscript>
<script>(function(){d=document;if(!document.getElementById('lytescr')){lyte=d.createElement('script');lyte.async=true;lyte.id="lytescr";lyte.src="http://futtta.be/lyte/lyte.js";d.getElementsByTagName('head')[0].appendChild(lyte)}})();</script></div>

The nitty-gritty (do skip if you’re not inclined to get aroused by technical details): this code attaches (a minified version of) lyte.js to the page’s head. The real work is done in that javascript-file: get all divs with class-name “lyte” (with a hack for friggin’ IE inlined), use the videoid which is in the divs’ id to fetch the thumbnail and title from YouTube, display these in a fashion which is very YouTube-like and add an onclick eventhandler to replace the fake with a real player when clicked (and remove the eventhandler to clean things up).
So using LYTE you can embed YouTube in such a way that the amount of data, the total download time and the total rendering time are significantly lower, without loosing any functionality.
And this -to conclude this long post- is what LYTE looks like (soundtrack by Nôze – “Meet me in the toilet”, it’s Friday after all);

Speed up your (WordPress-)site!

Google likes fast! Visitors like fast! So why don’t you go make your site really fast?
Suppose you just bought yourself hosting and you just installed WordPress for blogging or lightweight-CMS-purposes, how can you improve your site’s performance in that case? Easy!
  1. speed up PHP: use a caching optimizer (I use APC) to significantly speed up PHP performance (don’t bother  signing up for shared hosting with a company that doesn’t offer PHP with acceleration).
  2. cache dynamic output: install the “WP Super Cache” WordPress plugin. Configure and then forget about it; if you create/edit a blogpost, impacted pages are automatically removed from cache.
  3. optimize CSS and JS: install the “CSS JS booster” WordPress plugin, which (amongst other things) grabs all CSS and JS from WordPress and Plugins and outputs it in one CSS- and one JS-file (some plugins, e.g. Sociable and WordPress Mobile Pack, might need tweaking of the css media-attribute though) UPDATE: CSS JS booster has not been updated since 2010 and I switched to (and later even took over development of) Autoptimize for JS, CSS & HTML optimization.
  4. avoid calling 3rd party javascript: tracking (e.g. Google Analytics, which I removed), widgets (e.g. Twitter badges) or other 3rd party gadgets (e.g. AddToAny, which I removed) can slow down your site’s performance significantly
  5. optimize images: fire up your favorite photo editor and make that image just a bit smaller, use an acceptable level of compression (I end up between 70 and 80% for JPEG’s, depending on the image) and upload to smushit.com to squeeze out the last optimization-drop (example; I used a 20KB picture from Flickr, resized it to 80%, saved it with 77% compression and smushed it to end up with a mere 6KB).

The impact of a number of these steps can be measured easily; below are the response times of my blog’s homepage (the  html including css, js and images) as measured by Pingdom Tool’s Full Page Test.

  1. default WordPress (on a Linux VPS with 320Mb RAM memory): 6.5 seconds
  2. (1)  with PHP APC activated: 4.1 seconds
  3. (2) with WP Super Cache: 3.1 seconds
  4. (3) with CSS JS Booster: 1.3 seconds

So there you have it, from 6.5 to 1.3 seconds in only 5 easy steps! WordPress specific, but easily applicable to other platforms as well. Now go and make your site fast! And then go and make it even faster!

AddToAny removed-from-here


Update 02-2015: the information below does not reflect the way AddToAny works now and as such only has historical value, read this comment by the developer for more info.
When looking at my blog’s performance in Google Webmaster Tools I saw Google complained of multiple dns-lookups. I knew about stats.wordpress.com, google-analytics.com (well, yeah …) and gravatar.com, but one domain in the list didn’t make sense to me at all; media6degrees.com, so I started to investigate a bit. Grepping the wordpress-, theme- and plugin-code on my server didn’t reveal anything, so I went into Firebug to see what was happening in javascript.
Apparently the AddToAny WordPress-plugin was initiating the call:
  1. add-to-any requests http://static.addtoany.com/menu/page.js (which is rather big but gzipped & cache-able)
  2. page.js in turn contains tracking (near the end of the file), by requesting an 1X1 pixel image at http://map.media6degrees.com/orbserv/hbpix?pixId=2869&curl=<encoded URL of page>
  3. media6degrees then sends the pixel and … sets multiple cookies in the process

And what’s media6degrees business you ask? Maybe they’re just providing the add-to-any author with statistics? Well, not exactly. This is what media6degrees writes on their website: “We deliver scalable custom audiences to major marketers by utilizing the online connections of their consumers.” So by using AddToAny, you’re providing media6degrees with data about your site’s visitors, which they can use to sell targeted communication to their customers.
If visitors of small-time blogs like mine would be the only ones affected by this, the damage would be limited. But AddToAny is also implemented on large local news-outlets such as deredactie.be or De Standaard Online and no doubt on some big international sites as well. Somehow I doubt those organizations know they’re feeding their visitors to media6degrees and I bet some of them would even strongly disagree.
I’m not happy about this, that much is clear. AddToAny offers great functionality, but:

  • it adds unneeded requests to my page, causing the page to finish loading later (dns-request + http-request)
  • it enrolls my site visitors in a targeted communication platform without anyone knowing (or agreeing)
  • none of this is communicated on the AddToAny website or on the AddToAny WordPress plugin page

I mailed the author about this earlier this week (when i didn’t even know about media6degrees tracking cookies yet), but got no feedback up until now and I logged an issue on the wordpress.org support forum as well. And I decided to pull the plug on AddToAny off course, replacing it with sociable, making my blog render yet another millisecond faster, while at the same time protecting my visitors from this sneaky behavioral tracking by AddToAny and media6degrees.

Google Webmaster Tools Irony

A couple of days ago Google launched Site Performance as a labs project in Google Webmaster Tools. Being obsessed with speed is great and there indeed are valid remarks for this blog, but what to think of the advice below:

google site performance ironyThat’s right, I really should gzip-compress those javascript-resources! Google, please give me access to your servers so I can fix this immediately!

Trading eAccelerator for APC

Yesterday I somewhat reluctantly removed eAccelerator from my server (Debian Etch) and installed APC instead. Not because I wasn’t satisfied with performance of eAccelerator, but because the packaged version of it was not in the Debian repositories (Andrew McMillan provided the debs), and those debs weren’t upgraded at the same pace and thus broke my normal upgrade-routine. Moreover APC will apparently become a default part of PHP6 (making the Alternative PHP Cache the default opcode cache component). Installation was as easy as doing “pecl install apc” and adding apc to php.ini. Everything seems to be running as great as it did with eAccelerator (as most test seem to confirm).