Category Archives: Internet

All blogposts on blog.futtta.be about Internet (browsers, web development and mobile web).

WP DoNotTrack whitelist & WordPress/ Jetpack stats

Although the number of pageviews of this blog already decreased from approx. 2100 pageviews per week before mid May to 1300 pv/week after (I never thought I’d ever be hit by a Panda), yesterday was an absolute disaster. Turns out that Automattic changed the domain of the Jetpack stats tracking pixel to pixel.wordpress.com, which WP DoNoTrack (for which I pushed out a small update in May) blocked as that domain was not whitelisted. The downside of white- instead of blacklisting.

WordPress-as-a-service tip: Flywheel

flywheelAt work I was asked to provide advice on WordPress hosting. As we don’t have in-house LAMP-experience and as I didn’t want to have to take care of server operations myself (been there, done that), I decided to look into WordPress as a service solutions. To make things a tad more complicated, hosting had to be in a European data-center as we wanted optimal performance for our local customers and as our Privacy Officer requires all company data to be in Europe.

I contacted several US companies, but eventually Flywheel came out on top; they confirmed they could host in Europe (Amsterdam), seemed pretty eager, had a great package and they could provide me with a test-account to play around with their solution. And so I did; I set up a stock WordPress 3.9.x with Autoptimize and WP YouTube Lyte (call me prejudiced, but I like my own plugins), imported a bunch of posts from this blog and had WebPageTest be the judge.

The results were quite impressive;

Document CompleteFully Loaded
Load timeFirst byteStart renderDOM elemsTimeReqsBytes InTimeReqsBytes In
First View (Run 3)0.457s0.120s0.292s9260.457s473 KB1.008s12152 KB

0.120s until first byte, 0.292s start render and 0.457s doc complete? Sweet! So yeah, given those numbers, their offering and the fact they can deploy to a datacenter in Europe I do think Flywheel is a great choice for those who are looking for WordPress-as-a-service (well, PAAS really) solution!

Android Chrome bug when styling unicode character?

symbols CSS weirdness in chrome for androidWhile experimenting with the use of Unicode characters in a small proof of concept, I stumbled upon what I think is a bug in Chrome for Android. Apparently character ☰, which renders as ☰ and which most people consider the “options”-icon, cannot be given a color in Chrome for Android whereas other Unicode characters can.

As you can see when visiting this test-page, the 3 symbols styled correctly (font color white) in most browsers (tested on IE8, FF on W7, Ubuntu and Android, Chrome on W7 and Ubuntu), but the options-symbol is not white on Chrome for Android (at least on my Samsung Galaxy S4).

So, does this qualify as a bug, or did I just mess up? Anyone happens to know a workaround?

Nieuwe m.deredactie.be niet meer mobiel!

m.deredactie.be homepageIk ❤ mobiele websites, zelfs op de desktop. Bij het refreshen van m.deredactie.be vandaag (12 mei) kwam ik op de nieuwe versie uit. De developers hebben zich ongetwijfeld goed geamuseerd om niet alleen de nieuwe look & feel te implementeren, maar ook om de achterliggende technologie grondig te herbouwen naar een JavaScript-based UI volgens de “single page application“-filosofie (iemand heeft zich wel héél erg in angular.js verdiept, daar aan de Reyerslaan).

Maar ik, eenvoudige gebruiker, ben minder enthousiast. De site ziet er misschien moderner uit, maar is minder bruikbaar; zonder javascript is er niets te zien (neem een voorbeeld aan “cut the mustard“, progressive enhancement zoals de BBC die predikt), “above the fold” staan er enkel afbeeldingen en vooral; alles is plots trager!

Want over snelheid kun je niet discussiëren; sneller is beter, trager is slechter, zeker mobiel. De voorgaande versie van m.deredactie.be “woog” pakweg 150KB en laadde volledig in minder dan 2 seconden, maar de nieuwe versie tikt af op 2560KB in 6 seconden (gemeten op webpagetest.org met “cable” bandbreedte-profiel, met “fast mobile” wordt dat 17s).

Is m.deredactie.be een mobiele site? 2 megabyte aan data zeggen van niet.

PHP HTML parsing performance shootout; regex vs DOM

As I wrote earlier an Autoptimize user proposed to switch from regular expression based script & style extraction to using native PHP DOM functions (optionally with xpath). I created a small test-script to compare performance and the DOM methods are on average 500% slower than the preg_match based solution. Here are some details;

  • There are 3 tests; regular expression-based (preg_match), DOM + getElementsByTagName and DOM + XPath. You can see the source here and see it in action here.
  • The code in all 3 testcases does what Autoptimize does to start with when optimizing JavaScript:
    1. extract all javascript (code if inline, url if external) and add it to an array
    2. remove the javascript from the HTML
  • With each load of the test-script, the 3 tests get executed 100 times and total time per method is displayed.
  • That test-script was run 5 times on 3 different HTML-files; one small mobile page with some JavaScript and two bigger desktop ones with lots of JS.

The detailed results;

total time regextotal time domtotal time dom+xpath
arturo’s HP0.6114.83664.977
deredactie HP2.33225.6155.879
m deredactie HP0.06960.46040.4558

So while parsing HTML with regular expressions might be frowned upon in developer communities (and rightly so, as a lot can go wrong with PCRE in PHP) it is vastly superior with regards to performance. In the very limited scope of Autoptimize, where the regex-based approach is tried & tested on thousands of blogs, using DOM would simply create too much overhead.

Some HTML DOM parsing gotchas in PHP’s DOMDocument

Although I had used Simple HTML DOM parser for WP DoNotTrack, I’ve been looking into native PHP HTML DOM parsing as a possible replacement for regular expressions for Autoptimize as proposed by Arturo. I won’t go into the performance comparison results just yet, but here’s some of the things I learned while experimenting with DOMDocument which in turn might help innocent passers-by of this blogpost.

  • loadHTML doesn’t always play nice with different character encodings, you might need something like mb_convert_encoding to work around that.
  • loadHTML will try to “repair” your HTML to make sure an XML-parser can work with it. So what goes in will not come out the same way.
  • loadHTML will spit out tons of warnings or notices about the HTML not being XML; you might want to suppress error-reporting by prepending the command with an @ (e.g. @$dom->loadHTML($htmlstring);)
  • If you use e.g. getELementsByTagName to extract nodes into a seperate DomNodeList and you want to use that to change the DomDocument can result in … unexpected behavior as the DomNodeList gets updated when changes are made to the DomDocument. Copy the DomNodes from the DomNodeList into a new array (which will not get altered) and iterate over that to update the DomDocument as seen in the example below.
  • removeChild is a method of DomNode, not of DomDocument. This means $dom->removeChild(DomNode) will not work. Instead invoke removeChild on the parent of the node you want to remove as seen in the example below
// loadHTML from string, suppressing errors
$dom = new DOMDocument();
@$dom->loadHTML($html);

// get all script-nodes
$_scripts=$dom->getElementsByTagName("script");

// move the result form a DomNodeList to an array
$scripts = array();
foreach ($_scripts as $script) {
   $scripts[]=$script;
}

// iterate over array and remove script-tags from DOM
foreach ($scripts as $script) {
   $script->parentNode->removeChild($script);
}

// write DOM back to the HTML-string
$html = $dom->saveHTML();

Now chop chop, back to my code to finish that performance comparison. Who know what else we’ll learn ;-)

How to keep Autoptimize’s cache size under control (and improve visitor experience)

Confession time: Autoptimize does not have its proper cache purging mechanism. There are some good reasons for that (see below) but in most cases this is not something to worry about.

Except when it is something to worry about off course. Because in some cases the amount of cache-files generated by Autoptimize can grow to several Gigabytes. Why, you might wonder? Well, for each page being loaded Autoptimize aggregates all JS (and CSS) calculates the hash of that string and checks if an optimized version is in cache using that hash. If there is a difference (even if just a comma), the hash is not the same and the aggregated CSS/ JS is cached seperately. This behavior typically is caused by plugins that generate javascript-variables (or CSS-selectors) that are specific for each page (or even worse, for each page request). That does not only lead to a huge amount of files in the cache, but also impacts visitors as their browsers will have to request a different optimized CSS- or JS-file for each page instead of reusing the same file for several pages.

This is what you can do if you want a healthier cache both from a server- and visitor-perspective (based on JavaScript, but the same principle applies to CSS);

  1. Open two similar pages (posts).
  2. View source of the optimized JavaScript in those two pages.
  3. Copy the source of each to a seperate file and replace all semi-colons (“;”) with semi-colon+linefeed (“;\n”) in both files.
  4. Execute an automatic comparison between the two using e.g. diff (or “compare” in Notepad++), this should give you one or more lines that will probably be almost the same, but not exactly (e.g. with a different nonce or a postid in them).
  5. Now disable JS optimization and look for similar strings in the inline and the external JavaScript.
  6. If you find it in the inline JavaScript, try to identify a unique string in there (the name of a specific variable, probably) and write that down. If the variable JS is in a file, jot down the filename.
  7. Go to the autoptimize settings page and make sure the advanced settings are shown.
  8. Now add the strings or filenames from (6) to “Exclude scripts from Autoptimize:” (which is a comma-seperated list).
  9. Re-enable JS optimization.
  10. Save settings & clear cache.

This does require some digging, but the advantages are clear; a (much) smaller cache-size on disk and better performance for your visitors. Everyone will be so happy, people will want to hug you and there will be much rejoicing, generally.

So why doesn’t Autoptimize have automatic cache pruning? Well, the problem is a page caching layer (which could be a browser, a caching reverse proxy or a wordpress page caching plugin) contains pages that refer to the aggregated JS/CSS-files. If those optimized files were to be automatically removed while the page would remain in the page caching layer, people would get the cached page without any JS- or CSS-files being available. And as I don’t want Autoptimize to break your pages, I didn’t include a automatic cache purging mechanism. But if you have a bright idea of how this problem could be tackled, I’d be happy to reconsider, off course!