Damn, Venus doesn’t love noscript!
You’ve got no clue what I’m rambling about, do you? Well, allow me to explain;
- Venus is a planet-like web application that aggregates rss- and atom feeds from a community or about a specific subject (see e.g. planet mozilla, planet debian and the much-loved planet grep)
So I started looking at the Venus source and mailed with Planet Grep’s Wouter Verhelst to solve this issue. At first sight the solution seemed pretty straightforward; Venus shouldn’t ‘escape’ noscript but should instead just strip the opening and closing noscript-tag. Wouter installed a small sed-filter I wrote and added noscript to the whitelist of Venus’s sanitizer (which is based on Universal Feed Parser) and … it did not work.
The problem apperantly is with another sanitizing component in Venus; html5lib. Sam Ruby, the developer of Venus, wrote on the mailinglist;
There are multiple sanitization passes involved here. […] The html5parser seems to think that noscript is to be parsed as text only, which would result in the behavior that you describe. Looking at the current HTML5 spec, it appears that this does not match the expected behavior — so perhaps that changed too.
So I started looking at html5lib and … well, I’m stuck, html5lib is a pretty complex beast for a smalltime non-developer to dive into. So earlier today I turned to the html5lib discussion list to ask how sanitization can be configured not to escape noscript, let’s hope someone will enlighten me. Because until then those poor Planet Greppers won’t be able to see (a thumbnail of) Al Jarreau’s great version of Take Five way back in 1976: