Irregular Expressions have your stack for lunch

I love me some regular expressions (problems), but have you ever seen one crash Apache? Well I have! This regex is part of YUI-CSS-compressor-PHP-port, the external CSS minification component in Autoptimize, my WordPress JS/CSS optimization plugin:

/(?:^|\})(?:(?:[^\{\:])+\:)+(?:[^\{]*\{)/)/

yo regex dawgExecuting that on a large chunk of CSS (lots of selectors for one declaration block, which cannot be ripped apart) triggers a stack overflow in PCRE, which crashes Apache and shows up as a “connection reset”-error in the browser.

Regular expression triggered segfaults are no exception in the PHP bugtracker and each and every of those tickets gets labeled “Not a bug” while pointing the finger at PCRE, which in their man-pages and in their bug tracker indeed confirm that stack overflows can occur. This quote from that PCRE bug report says it all, really;

If you are running into a problem of stack overflow, you have the
following choices:

  (a) Work on your regular expression pattern so that it uses less 
      memory. Sometimes using atomic groups can help with this.
  (b) Increase the size of your process stack.
  (c) Compile PCRE to use the heap instead of the stack.
  (d) Set PCRE's recursion limit small enough so that it gives an error
      before the stack overflows.

Are you scared yet? I know I am. But this might be a consolation; if you test your code on xampp (or another Apache on Windows version), you’re bound to detect the problem early on, as the default threadstacksize there is a mere 1MB instead of the whopping 8MB on Linux.

As for the problem in YUI-CSS-compressor-PHP-port; I logged it on their Github issue-list and I think I might just have a working alternative which will be in Autoptimize 1.8.

2 thoughts on “Irregular Expressions have your stack for lunch

  1. Tom Pester

    Hi Frank,

    I am fond of Regular Expressions (and not afraid to admit it). If you need some assistance let me know with a the regex and the input that causes the problem. Most of the time perf problems (and as a consequence bumping against timeouts and limits) is due to “catastrophic backtracking”, a sign that the regex works good enough on small input but performs expenentioalty worse on larger input.

    Reply
    1. frank Post author

      Thanks Tom! The problem was with recursion in the regex which indeed worked on small input but broke on a (significantly) larger string. Next time I need a regex fixed I’ll send you a mail instead of doing all that trial and error :-)

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *