Choosing a CDN in a whim

I had to look into CDN’s some time ago, to find a suitable temporary solution for a problem at work. There are a lot of players in this field, Akamai and Amazon (Cloudfront) being market leaders of some sort, but there’s also Microsoft with their Azure CDN (which we already had some experience with), other big guns such as Rackspace and Level3 and specialized shops such as CacheFly, CDNetworks and NetDNA as well. So how to choose?

Results only relevant for Belgium (and even then …)avg. speed (ms) for 64kBspeed delta % from fastest
softlayer121.3100%
gogrid123.0101%
microsoft azure132.0109%
level3132.0109%
amazon cloudfront133.3110%
maxcdn136.7113%
cotendo138.7114%
cachefly147.3121%
rackspace156.3129%
highwinds226.3187%
voxcast227.7188%
flexiscale317.3262%
amazon s3 eu417.3344%
cloudflare
invalid result
575.0 NA474% NA
google appspot668.0551%
voxel nl814.0671%
amazon s3 us932.0768%
voxel ny942.0776%

Well, if you’re in a hurry, you could compare price and features via cdnplanet.com. The info might not always be complete, but it does give you a good first idea and you can always visit the CDN’s proper site for more details, can’t you?
After comparing features & pricing, you really should get an idea of the speed of these CDN’s, of their performance relative to your customers. I found this CDN Speed Test on cloudclimate.com very useful; it performs a live test of approximately 20 CDN providers, requesting a 64 kilobyte file 10 times for each CDN from within your browser. So if you can get a sample of your customers to perform that test and provide you with the results, you’ll have some very useful information about performance. Together with your overview of features and price, you should be able to make at least a vaguely educated decision, no?
To have an idea about performance for our market (Belgium), I asked some Facebook-friends to provide me with the results of the CDN Speed Test. Most data I received was for Telenet or Skynet/Belgacom, not coincidentally the biggest ISP’s here. You can see the aggregated results in that ugly table on the left (or a couple of paragraphs up, if you’re subscribed to the RSS-feed).
My conclusion: as I was looking for a pay-as-you-go (no obligations, no monthly fee) CDN for static files, with support for Origin-Pull, HTTPS and some administration features (for example to purge the cache and watch nice graphs), MaxCDN fit the picture pretty well. With a great introductory price ($40 for the first Terabyte and even less if you find the coupon code) and performance that is at 113% of the fastest competitor, they seem to have found somewhat of a sweet spot for my specific context.
The only problem; I’ve got to wait for a “GO” from some people higher up the food chain. Maybe I should already implement it on my blog, just for the fun of it?

Firefox Mobile: the best mobile browser no-one uses

I’ve always enjoyed riding the Firefox-bandwagon and that hasn’t changed, even though Google Chrome seems to be the browser of choice amongst the cool kids nowadays. And if only because I’m a faithful guy, I’ve been running Firefox Mobile ever since I bought a Samsung Galaxy SII as well. Sure it doesn’t do Flash, but I’m not that Flash-inclined anyway.
Now, I haven’t met too many people that use Firefox Mobile and indeed when reading about mobile browsers, Firefox is rarely if ever mentioned. But what if I told you that Firefox Mobile is by far the best browser on mobile when taking performance, features and security into consideration?
I won’t beat around the bush, here’s the pretty objective data.

browserhardwareSunspiderv8 benchm.html5test score
Firefox Mobile 9bSamsung Galaxy SII1421.9ms832314
Android 2.3 browserSamsung Galaxy SII3454.4ms369177
Android 4 browserGoogle Galaxy Nexus1983ms1387230
Mobile SafariiPhone 4s2260.9ms368296
Opera Mobile 11.5Samsung Galaxy SII1699.9ms461285
Dolphin HD 7.2Samsung Galaxy sII3593.4ms318177

Some remarks:

  • the hardware is pretty comparable; all dual-core CPU’s and plenty of RAM.
  • higher is better, except for Sunspider which measures time (in microseconds).
  • I’ve got no screenshot or URL of the google v8 test results on my phone, but I’ll be glad to reproduce.
  • sunspider and v8 are javascript performance benchmarks.
  • html5test is an indication for support of “modern” browser features (html5, css3 and much more).
  • the features of the browser GUI arent’t measured byhtml5test, but I’m pretty pleased with Firefox Mobile in that respect as well; great tabbed browsing, plugins (including noscript!), sync-ing of all relevant data between desktops & mobile, …
  • I added Opera Mobile and Dolphin HD to the list. Opera’s not too shabby but not a winner either?

And last but not least; as Firefox Mobile isn’t native and since it’s on the same (crazy) rapid release cycle as the desktop-version, I consider it to be a lot more secure when compared to the slow evolving, rarely updated native browsers in Android and iOS.
My advice; if you’re an Android-user and you’ve got a recent handset or tablet, you really should consider switching to Firefox Mobile. It’s the best mobile browser no-one is using! Except for you?

Ranting & Raving at Drupal Summit 2011

I attended Drupal Summit in Genk a couple of days ago and amidst the general “Drupal is the best thing since sliced bread” atmosphere, there were some interesting discussions about the platform’s maturity. Especially the presentations of Peter Van Den Broeck (for VRT) and Wouter Mertens (for competitor VMMA) seemed to be on opposite sides, with VMMA having multiple successful Drupal-sites in production and VRT struggling to get their projects finished, telling Captain Buytaert “Dries, we’re not there yet“. But underneath the surface and despite the differences (a dedicated team of sysadmins & developers at VMMA vs fixed price, project-based external development for VRT), both were talking about the same problems on a technical level; modules & performance.
The Drupal module community is a great bunch of enthusiastic developers, building an impressive number of modules that cover almost any feature one would want to have in a website. But module quality, support & compatibility varies enormously. Some modules seem to be true Rube Goldberg machines, providing tons of functionality that only few people need but which makes the the UI a usability-hell and the code complex, error-prone and possibly a real performance-hog.
And while we’re on the subject; performance doesn’t come out of the box and Drupal as such does not scale very well. Install it with a bunch of modules, generate a reasonable amount of page requests and you have a CPU-intensive system that generates a crapload of database-connections. Adding memcache between your Drupal & MySQL helps, as most requests will be handled from cache. And putting a caching reverse proxy (Varnish, Squid or even Apache’s mod_proxy+mod_cache if you insist) in front of your Drupal does miracles, serving visitors the same content without the need for Drupal to bootstrap. So sure, you can build a scalable solution that provides great performance, but one could say this is despite Drupal, not because of it. After all, when using Memcache & Varnish almost any CMS will have great performance, won’t it?
So yeah, Drupal can be a nice solution to your problem, but it does require more than just a superficial knowledge of how to install it together with some modules and a theme. Make sure there are smart people on your team or project, that have a profound knowledge of modules & module development and who know a lot about MySQL (and clustering, mirroring or master/slave setups), Memcache and Varnish (or Squid, forget about mod_cache if you can). You’re bound to run into some problems, as Peter Van Den Broeck confirmed, but with the right people and architecture your Drupal-project can indeed be the best thing since sliced bread.

Quick & dirty “CDN” in WordPress

If you want a quick & dirty way to speed up WordPress without any plugins:

  1. create a subdomain in DNS, e.g. static.yourblog.org
  2. add that subdomain as a ServerAlias to your Apache config (remember to restart apache)
  3. add the following to your wp-config.php:
    define('WP_CONTENT_URL','http://static.yourblog.org/wp-content');

If you want to do this the right way; plugins such as WP Super Cache or W3 Total Cache do a much better job at this.

3 Apache mod_cache gotchas

If you want to avoid the learning curve of Squid and Varnish or the cost of a dedicated caching & proxying appliance, using Apache with mod_cache may seem like a good, simple and cheap solution. Rest assured, it can be -to some extent- but here are 3 gotchas I learned the hard way:

  1. mod_cache ignores Cache-control if Expires is in the past (which it shouldn’t according to RFC2616), so you might have to unset the Expires-header.
  2. mod_cache by default caches cookies! Let me repeat; cookies are cached! That might be a huge security-disaster waiting to happen; sessionid’s (that provide access for logged-on users) are generally stored in cookies. If a logged on user that request an uncached page, then that user’s cookie will get cached and sent to other users that request the same page. Do disable this by adding “CacheIgnoreHeaders Set-Cookie” to your config
  3. mod_cache by default treats all browsers like the one that triggered the caching of the object. In the field that approach can cause problems with e.g. CSS-files that are stored gzipped (because the first browser requested with header “Accept-Encoding: gzip, deflate”). If a browser that does not support gzipped content requests the same file, the CSS will be unreadable and thus not applied. The solution; make sure the “backend webserver” sends the “Vary: Accept-Encoding” header in the response (esp. for CSS-files). This will tell mod_cache to take different Accept-Encodings into account, storing and sending different versions of the same CSS-file.

Why your WordPress blog needs DoNotTrack

So what’s with all that nagging about tracking and that DoNotTrack plugin, you might wonder? Well, it’s pretty simple actually.

  1. Some very popular WordPress plugins include 3rd party tracking, sometimes even without properly disclosing, often without means to disable this behavior
  2. 3rd party tracking has privacy implications: all your visitors are tracked by the 3rd party, in general for behavioral marketing purposes (depending on what data is captured, tracking might even be illegal in some countries)
  3. 3rd party tracking has a performance impact: every visit to your blog will include between 2 and 5 extra requests for the 3rd party tracking to succeed, effectively delaying full page rendering

It is my conviction that blog owners should be able to install and use WordPress plugins without having to worry about undisclosed tracking and that plugins should provide a way to disable such 3rd party tracking if included.
As this is not the case yet, we have to resort to (messy) solutions to stop unwanted tracking from happening. And that’s exactly what DoNotTrack does. It’s a small javascript-hack in a WordPress-plugin to stop 3rd party tracking introduced by some of the most popular plugins.
Some details from the readme.txt:

  • What works:
  • What does not work (yet): Tracking code added using innerHTML or appendChild/insertBefore is not yet intercepted (but I’m working a solution for that)
  • What else might be added:
  • How you can help:
    • Provide me with links to plugins that include browser-based tracking + domain where the tracking is done.
    • Provide me with known opt-out code (javascript) to disable tracking services on a site.
    • Tell plugin writers you’re not happy with 3rd party tracking!
    • Tell your visitors about tracking & privacy, link to e.g. http://www.privacychoice.org/

And remember: if you host your WordPress blog yourself, you and nobody else should be able to decide who tracks your users!

Quantcast spyware puts selfhosted WordPress blogs in Automattic network

A quick update about the WordPress.com Stats plugin secretive inclusion of Quantcast tracking:

Website performance: impact of web fonts

Although it was possible to embed fonts in web pages even during the dark ages of the internet, 2010 was the year when the pieces of the web typography puzzle really fell into place, Designers, Marketeers and especially “brand identity guards” are ecstatic about this evolution.
But there’s little to be found on the performance impact of web fonts. Because you’re downloading extra files, so there must be some impact, no? I created 4 identical pages; one without non-browser-native fonts, one with Cufon (which I already knew I wasn’t fond of), one with the hosted Google Font solution and one with local fonts (from fontsquirrel.com). Those 4 pages were analyzed by webpagetest.org (3 runs on IE8 from the Amsterdam node). These are the results:

doc complete (s)# requestsdownload size (Kb)diff with basic (s)full testresult url
basic1.50721135http://www.webpagetest.org/result/110106_1Y_7S9K/3/details/
cufon2.452261760.95http://www.webpagetest.org/result/110106_F8_7S9P/1/details/
webfont (local)1.807231890.30http://www.webpagetest.org/result/110106_V3_7SBD/1/details/
webfont (google)1.771231620.26http://www.webpagetest.org/result/110106_S4_7SM2/3/details/

As was to be expected and confirmed in the Google Webfonts FAQ, even the fastest method (hosted Google Fonts) is 0,26s slower then the version without web fonts. Locally hosted fonts are marginally slower, but that’s because (for reasons I don’t fully understand) 2 font-files were loaded instead of only 1 in the Google Fonts-powered page. Cufon isn’t really worth mentioning here, it’s dead-slow and has become rather irrelevant I guess. I did not test a version with images instead of web fonts, but that would probably yield approximately the same result (5 image files, each somewhere between 4 and 8Kb?) and is sub-optimal form a SEO and accessibility point of view.
So 0,3 seconds, that’s not too bad, no? Or is it? If you’re creating an online shop, Amazon’s experience might be of particular interest here; every 100ms delay costs them 1% in sales. So are you willing to lose 3% revenue just to have those nice fonts, which your visitors probably couldn’t care about less?
My advice; don’t use web fonts if you value performance. And you should consider performance a top priority, not just for sales’ sake, but because raw website performance also impacts usability and search engine ranking.
If your brand identity manager insists on using the right fonts, performance be damned, then try to at least follow these basic best practices:

  • have the CSS that defines your fonts (with font-face) as early in your HTML as possible and  certainly before any script is loaded
  • use Google’s hosted font-solution if possible; it’s fast, the CSS is optimized for the browser that requests it and there’s a lot of free fonts at your disposal
  • if you have to host the fonts yourself, make sure to serve the WOFF format (the new standard) and SVG (for iPhone/iPad) next to the more traditional TTF (the “old” standard) and EOT (for MSIE)
  • if you’re hosting your own fonts, configure your webserver to on-the-fly compress ttf, svg & eot-files (woff-files are already compressed) and set expire-headers for all fonts in the far future to allow optimal caching (example for Apache):

AddType font/ttf .ttf
AddType font/x-woff .woff
AddType image/svg+xml .svg
AddType application/vnd.ms-fontobject .eot
ExpiresByType font/ttf “access plus 30 days”
ExpiresByType font/x-woff “access plus 30 days”
ExpiresByType image/svg+xml “access plus 30 days”
ExpiresByType application/vnd.ms-fontobject “access plus 30 days”
AddOutputFilterByType DEFLATE font/ttf application/vnd.ms-fontobject image/svg+xml

WordPress.com Stats trojan horse for Quantcast tracking

Suppose you’re a blogger who values website performance and online privacy. You may have ditched Google Analytics because you think the do-no-evilers do not have to know who is on your site. Maybe you removed AddtoAny because of the 3rd party tracking code that slows down your site ever oh so slightly. And you don’t want the omnipresent Facebook Like widget for all the above reasons. No, the only 3rd party javascript you allow is the one pushed by the WordPress.com Stats plugin; one javascript-file  and one pixel and you get some nice stats in return. And come on, WordPress, those are the good guys, right?
Well, apparently not. While performing a test on for example webpagetest.org, you’ll see two requests to the quantserve.com domain;

http://edge.quantserve.com/quant.js
http://pixel.quantserve.com/pixel;r=705640318;fpan=1;fpa=P0-450352291-1292419712624;ns=0;url=http%3A%2F%2Fblog.futtta.be%2F;ref=;ce=1;je=1;sr=1024x768x32;enc=n;ogl=;dst=1;et=1292419712624;tzo=300;a=p-18-mFEk4J448M;labels=type.wporg

Ouch, that hurts! But surely Quantcast aren’t in the same league as AddtoAny’s media6degrees, who do behavioral advertising based on data captured all across the web? Well … Quantcast might be better known, but they do exactly the same thing; collecting user information and providing that info for targeted advertising. And just so you know, Quantcast is one of the companies that is on trial for restoring deleted cookies using Flash (“zombie cookies”). So no, I’m not comfortable with Quantcast collecting data on my blog’s visitors.
Now I know that I opted in on user-tracking by WordPress (or rather Automattic). And I can live with them knowing who visits my blog, I can live with the small performance-impact that the stats-plugin has on my site that way. But I did not sign up for 3rd party tracking, the plugin-page conveniantly fails to mention the extra tracking, there’s no opt-out mechanism in the plugin and there’s no info to be found on how to disable Quantcast tracking users on my own blog. I am not a happy WordPress-blogger!
So Automattic; please fess up and at least provide instructions on how to disable 3rd party tracking, just like AddtoAny’s Pat gracefully did?


Update 20 january 2011; Automattic seems unwilling to acknowledge there is a problem, the thread on wordpress.org forums where this was discussed has been closed. I created a small WordPress plugin, DoNotTrack, to stop Quantcast tracking. you can download it here.

Drupal, mod_cache & RFC2616 caching

Suppose you’re setting up a Drupal-based site for which you have to implement a caching reverse proxy and for reasons beyond your comprehension Varnish (or even Squid) are not an option. Oh no, you’re stuck with Apache’s mod_proxy and mod_cache! What should you do?
First of all, Drupal 6 doesn’t like reverse proxies. If you don’t want to wait for version 7, which should do better in this respect, you might want to look at Pressflow. This Drupal 6 “distro” has everything on board to work with reverse proxies. So install Pressflow (or try to apply this out of date diff to stock Drupal) and in the Performance-screen set “Caching Mode” to “External” and “Page Cache Maximum Age” to the number of minutes you consider a cached page valid. Voila, you’re done in Drupal (edit: almost, as you might also want to change the $base_url in sites/default/settings.php to reverse proxy URL after you configured Apache).
Next up: Apache! A simple configuration like this one should do the trick:

ProxyRequests Off
ProxyPass /rp_drupal http://localhost/pressflow
ProxyPassReverse /rp_drupal http://localhost/pressflow
CacheEnable disk /rp_drupal/
CacheRoot c:/TEMP/apacache
CacheDefaultExpire 3600

OK, this must surely work, no? Well it should, but it doesn’t! When setting your Apache-loglevel to debug you’ll see “not cached” entries in your error-log, with the following reason:

Expires header already expired, not cacheable

Expires in the past, what does Pressflow think it’s doing deep down in includes/bootstrap.inc?

// HTTP/1.0 proxies do not support the Vary header, so prevent any caching
// by sending an Expires date in the past. HTTP/1.1 clients ignores the
// Expires header if a Cache-Control: max-age= directive is specified (see RFC
// 2616, section 14.9.3).
drupal_set_header('Expires', 'Sun, 11 Mar 1984 12:00:00 GMT');
// [...]
$max_age = variable_get('cache', CACHE_DISABLED) == CACHE_AGGRESSIVE && (!isset($_COOKIE[session_name()]) || isset($hook_boot_headers['vary'])) ? variable_get('page_cache_max_age', 0) : 0;
$default_headers['Cache-Control'] = 'public, max-age=' . $max_age;

Darn, those Pressflow-guys seem to have read up on their RFC’s! And indeed, 2616 confirms that cache-control’s max-age overrules expires;

If a response includes both an Expires header and a max-age directive, the max-age directive overrides the Expires header, even if the Expires header is more restrictive. This rule allows an origin server to provide, for a given response, a longer expiration time to an HTTP/1.1 (or later) cache than to an HTTP/1.0 cache.

Mod_cache’s code seems to take a much simpler approach; at line 503 it decides not to cache based on an Expires-header in the past, totally dismissing the potential presence of cache-control’s max-age.

else if (exp != APR_DATE_BAD && exp < r->request_time)
    {
        /* if a Expires header is in the past, don't cache it */
        reason = "Expires header already expired, not cacheable";
    }

But you’re not interested in code which does or does not adhere to whatever RFC some spec-buffs came up with, you just want to cache your frigging’ Drupal-site! Well, fear not little hacker-boy, here’s some Apache-magic to cure your ailments, to be copy/pasted in the config before ProxyPass and ProxyPassReverse:

<Location /rp_drupal>
     SetEnvIf Request_Protocol "HTTP/1.1" expires_overrule
     # homework: add a SetEnvIf to see if cache-control max-age is present
     Header unset Expires env=expires_overrule
</Location>

So there you have it, a rudimentary caching setup for Drupal (in the guise of Pressflow) using nothing but Apache’s mod_proxy and mod_cache. Now go do your homework and test and do some finetuning and test some more. Happy caching!