Back to Top

Wednesday, July 04, 2007

Regex magic

First of all I want to apologize to my readers (both of them :-)) for bein AWOL, but real life sometimes interferes pretty badly.

I always been a big fan or regular expressions and one of the main reasons I love Perl is because they are so deeply integrated in it and are natural to use. (Of course there are many negative aspects one must be aware, like speed or the fact that sometimes they can be quite hard to read). To deal with the later problem, here is a link to a Perl module which tries to dissect and explain step by step what a regular expression does:

YAPE::Regex::Explain. Be aware that it has a dependency on YAPE::Regex, but this fact is not specified in the package, so doing an install YAPE::Regex::Explain will fail if it's not preceded by an install YAPGE::Regex, even though this should be done automatically (and it would be if the package would be created properly). Running a regular expression through this module will produce an output like the following:

The regular expression:

(?-imsx:a+?)

matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  a+?                      'a' (1 or more times (matching the least
                           amount possible))
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

An other interesting module I came on thanks to this blog post is Regexp::Assemble, which can be used to combine regular expressions and create a big expression which would match anything the starting expressions would have matched (so it is a reunion of the regular expressions), but it's also optimized! Wicked cool.

0 comments:

Post a Comment

You can use some HTML tags, such as <b>, <i>, <a>. Comments are moderated, so there will be a delay until the comment appears. However if you comment, I follow.