Applying Javascript AOP-magic to stop 3rd party tracking in WordPress

It was always my intention to elaborate on my small donottrack plugin for WordPress, but it was only when Automattic upgraded to the new asynchronous Quantcast code that I was forced to look actually dig in.
The new Quantcast-code doesn’t use the old-fashioned document.write, but inserts the javascript asynchronously with an insertBefore on the parent of the first script-node (as popularized by the asynchronous Google Analytics-code). Variations on this method would include e.g. using appendChild or adding it to head (although that might not exist).
A couple of months ago I experimented with the DomNodeInserted event, but that isn’t supported by all browsers. And even when it works, I found no consistent way to stop the tracking script (which was already added to the DOM, as the event is triggered after) from being loaded or executed. But last week while searching for a better solution I found a reference to javascript AOP on StackOverflow and after following some links I discovered the JQuery AOP-plugin.
JQuery AOP allows one to (amongst other things) add an advice around a method. When the method is called, the advice kicks in before the execution. The advice is a function which can investigate and change the parameters used by the method. And that’s exactly what the current version of DoNotTrack does; it has AOP.around (I’ve removed the JQuery dependency) catch insertBefore and appendChild, investigates the src-attribute and replaces that value if it points to quantserve.com before allowing the method execution to proceed.

scriptParent=document.getElementsByTagName('script')[0].parentNode;
aop.around( {target: scriptParent, method: /[insertBefore|appendChild]/},
        function(invocation) {
                if ((typeof(invocation.arguments[0].src)==='string')&&((invocation.arguments[0].tagName.toLowerCase()==='script')||(invocation.arguments[0].tagName.toLowerCase()==='img'))) {
                        if (sanitizer(invocation.arguments[0].src)===true){
                             invocation.arguments[0].src='javascript:return false;';
                        }
                }
                return invocation.proceed();
        }
);

I’m working on a more generic version of an AOP-based WordPress Privacy plugin now. In a first stage it will probably be based on a blacklist, that is editable in the WP Privacy options-screen but at a later date a whitelist-based approach will be added (based on an integration with webpagetest.org). Let’s add that to my New Years resolution for 2012, shall we?

WP Privacy: Quantcast sneaks back in

After almost a year of peace and quiet, Quantcast tracking code has returned to this blog. As reported by Brian Yang, the stupid hack that stopped the code from being included doesn’t work any more. Automattic recently switched to the new Quantcast-code, which instead of using the old-fashioned document.write now gets inserted asynchronously by a DOM-method (insertBefore). I’m looking at ways to stop this from happening or at least limit it one way or the other, but for the time being there’s no fix. Bear with me and do speak up (in the comments below of via the contact form) if you think you can help!

Quercus PHP on GAE: pining for file handles

Quercus really is great stuff; it allows nitwits like me to develop crappy PHP-applications and to deploy them on Google’s App Engine. But when you combine the limitations of Quercus’ PHP implementation with those of GAE, you’re going to have to code around some problems you wouldn’t be facing when developing a “normal” PHP webapp.
One example based on my limited experience (while writing a scanner service to detect “foreign” objects in websites for my future wp-privacy plugin): I had a CSV-file that had to be downloaded & parsed. Normally you would fopen the remote file and than use fgetcsv retrieve all data line per line. Or, if you’d prefer, you could fetch the file with mighty CURL and parse it using str_getcsv. But those approaches don’t work when in Quercus on GAE; fopening remote files doesn’t work (blame GAE)  and while you can Curl the CSV into a variable, there’s no str_getcsv in Quercus (yet).
So I did what any self-respecting non-developer would do; I cried for help on StackOverflow. Some of the advice I got there involved obscure tricks like using data-uri’s, fopening php://memory or using SplTempFileObject, but none of those solutions produced anything but errors. So no built-in CSV-parsing for me, but (simple) “manual” parsing of the CSV in a string. Not a huge problem by any measure, but I’m sure there’s a whole lot more limitations, if only for all those functions that rely on file handles. But at least we’re having fun, no? 😉