Tag Archives: regular expression

Irregular Expressions have your stack for lunch

I love me some regular expressions (problems), but have you ever seen one crash Apache? Well I have! This regex is part of YUI-CSS-compressor-PHP-port, the external CSS minification component in Autoptimize, my WordPress JS/CSS optimization plugin:

/(?:^|\})(?:(?:[^\{\:])+\:)+(?:[^\{]*\{)/)/

yo regex dawgExecuting that on a large chunk of CSS (lots of selectors for one declaration block, which cannot be ripped apart) triggers a stack overflow in PCRE, which crashes Apache and shows up as a “connection reset”-error in the browser.

Regular expression triggered segfaults are no exception in the PHP bugtracker and each and every of those tickets gets labeled “Not a bug” while pointing the finger at PCRE, which in their man-pages and in their bug tracker indeed confirm that stack overflows can occur. This quote from that PCRE bug report says it all, really;

If you are running into a problem of stack overflow, you have the
following choices:

  (a) Work on your regular expression pattern so that it uses less 
      memory. Sometimes using atomic groups can help with this.
  (b) Increase the size of your process stack.
  (c) Compile PCRE to use the heap instead of the stack.
  (d) Set PCRE's recursion limit small enough so that it gives an error
      before the stack overflows.

Are you scared yet? I know I am. But this might be a consolation; if you test your code on xampp (or another Apache on Windows version), you’re bound to detect the problem early on, as the default threadstacksize there is a mere 1MB instead of the whopping 8MB on Linux.

As for the problem in YUI-CSS-compressor-PHP-port; I logged it on their Github issue-list and I think I might just have a working alternative which will be in Autoptimize 1.8.

You gotta love HTML5’s input types and patterns

While working on updates of the admin-screen of Autoptimize, I wanted some checks on the URL a user can enter for a CDN. At first I thought I’d do some jQuery-based validation but then I came accross a page (on StackOverflow I guess) that mentioned the new input types and the use of validation patterns, so now I just have this beauty in place;

<input type="url" pattern="^(https?:)?\/\/([\da-z\.-]+)\.([\da-z\.]{2,6})([\/\w \.-]*)*\/?$" />

And to make sure the user gets a visual indication if the string isn’t a valid URL (according to the regex) there’s this CSS oneliner:

input[type=url]:invalid {color: red; border-color:red;}

The nice thing about this is that it is pretty backwards compatible;

  1. browsers that do not know type=”url” will just consider it type=”text” and can still enforce the pattern (if supported).
  2. browsers who support type=”url” but not the pattern, will do more basic validation.

And for browsers that do not support type=”url” nor patterns, there are multiple polyfills available that force older browsers to comply as well. But who cares about older browsers, right?