Musings and frustrations

November 19, 2007

Regular Expressions.

Filed under: Regular Expressions — Tags: — peteohanlon @ 4:01 pm

A while ago, I considered writing an article containing some common Regular Expressions. For one reason and another, I never got round to writing this article until somebody recently emailed me to ask if I’d done it or not. Well, feeling guilty, I decided to push ahead with the article. Over the next few days, I’ll be posting about my experiences pulling these regular expressions together and discussing how they hang together.

Anyway, the first regular expression is a simple enough one. It simply grabs all of the tags from a piece of HTML like input:

public static MatchCollection GrabTags(string value)
{
    Regex regex = new Regex(
      @"</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>",
          RegexOptions.IgnoreCase
          | RegexOptions.Multiline
          | RegexOptions.IgnorePatternWhitespace
          );
    return regex.Matches(value);
}

Tomorrow, we’ll break this one down into its constituent parts and talk about how it all fits together.

Blog at WordPress.com.