Keeping it regular

Last week I posted an example of using a regular expression to control the input of numbers into a TextBox. A couple of my fellow disciples commented that they’d just use a regular expression Behavior or DP to control the input of the text. Now, there are a couple of reasons that I’d use the dedicated NumericTextBoxBehavior.

The first reason is that it’s a simple control for those that aren’t comfortable writing regular expressions. After all, why should you write a complex regular expression when I can write one for you?

The second reason is that the numeric control is internationalised from the get-go. I’ve already take care of sorting out the whole internationalised decimal place issue, so you don’t have to worry about it with your regular expression.

Saying that, the regular expression behavior is a cracking idea, and one I could kick myself for not thinking of earlier. So, in order to add regular expression functionality in your TextBox, all you need do is add the following code:

namespace Goldlight.Base.Behaviors
{
  using System.Linq;
  using System.Windows.Controls;
  using System.Windows.Interactivity;
  using System.Windows;
  using System.Windows.Input;
  using System.Text.RegularExpressions;

  /// <summary>
  /// Apply this behavior to a TextBox to ensure that input matches a regular
  /// expression.
  /// <para>
  /// <remarks>
  /// In the view, this behavior is attached in the following way:
  /// <code>
  /// <TextBox Text="{Binding Price}">
  ///   <i:Interaction.Behaviors>
  ///   <gl:RegularExpressionTextBoxBehavior 
  ///    Mask="^([\(]{1}[0-9]{3}[\)]{1}[ ]{1}[0-9]{3}[\-]{1}[0-9]{4})$" />
  ///   </i:Interaction.Behaviors>
  /// </TextBox>
  /// </code>
  /// <para>
  /// Add references to System.Windows.Interactivity to the view to use
  /// this behavior.
  /// </para>
  /// </remarks>
  public class RegularExpressionTextBoxBehavior : Behavior<TextBox>
  {
    /// <summary>
    /// Gets or sets the regular expression mask.
    /// </summary>
    public string Mask { get; set; }

    #region Overrides
    protected override void OnAttached()
    {
      base.OnAttached();

      AssociatedObject.PreviewTextInput += AssociatedObject_PreviewTextInput;
#if !SILVERLIGHT
      DataObject.AddPastingHandler(AssociatedObject, OnClipboardPaste);
#endif
    }

    protected override void OnDetaching()
    {
      base.OnDetaching();
      AssociatedObject.PreviewTextInput -= AssociatedObject_PreviewTextInput;
#if !SILVERLIGHT
      DataObject.RemovePastingHandler(AssociatedObject, OnClipboardPaste);
#endif
    }
    #endregion

#if !SILVERLIGHT
    /// <summary>
    /// Handle paste operations into the textbox to ensure that the behavior
    /// is consistent with directly typing into the TextBox.
    /// </summary>
    /// <param name="sender">The TextBox sender.</param>
    /// <param name="dopea">Paste event arguments.</param>
    /// <remarks>This operation is only available in WPF.</remarks>
    private void OnClipboardPaste(object sender, DataObjectPastingEventArgs dopea)
    {
      string text = dopea.SourceDataObject.GetData(dopea.FormatToApply).ToString();

      if (!string.IsNullOrWhiteSpace(text) && !Validate(text))
        dopea.CancelCommand();
    }
#endif

    /// <summary>
    /// Preview the text input.
    /// </summary>
    /// <param name="sender">The TextBox sender.</param>
    /// <param name="e">The composition event arguments.</param>
    void AssociatedObject_PreviewTextInput(object sender, TextCompositionEventArgs e)
    {
      e.Handled = !Validate(e.Text);
    }

    /// <summary>
    /// Validate the contents of the textbox with the new content to see if it is
    /// valid.
    /// </summary>
    /// <param name="value">The text to validate.</param>
    /// <returns>True if this is valid, false otherwise.</returns>
    protected bool Validate(string value)
    {
      TextBox textBox = AssociatedObject;

      string pre = string.Empty;
      string post = string.Empty;

      if (!string.IsNullOrWhiteSpace(textBox.Text))
      {
        pre = textBox.Text.Substring(0, textBox.SelectionStart);
        post = textBox.Text.Substring(textBox.SelectionStart + textBox.SelectionLength, 
          textBox.Text.Length - (textBox.SelectionStart + textBox.SelectionLength));
      }
      else
      {
        pre = textBox.Text.Substring(0, textBox.CaretIndex);
        post = textBox.Text.Substring(textBox.CaretIndex, 
          textBox.Text.Length - textBox.CaretIndex);
      }
      string test = string.Concat(pre, value, post);

      string pattern = Mask;

      if (string.IsNullOrWhiteSpace(pattern))
        return true;

      return new Regex(pattern).IsMatch(test);
    }
  }
}

As you can see, it’s similar in code to the other behaviour. The only real difference in it is that it has a Mask string which is used to add the regular expression text.

Return to Regex.

Continuing with our series on regular expressions, here’s a more complex example of regular expression testing. In this example, we’re going to see how to check a date that comes in different formats. The extra wrinkle that we are providing here is that we’re constraining the date so that it can only be in certain centuries.

First of all, we’re going to define how the date is separated out. For instance, if we use DateSeparator.Slash, then the date will be separated by the / character.

/// <summary>
/// Describes how the date is separated.
/// </summary>
public enum DateSeparator
{
    /// <summary>
    /// The parts of the date are separated by a space.
    /// </summary>
    Space,
    /// <summary>
    /// The parts of the date are separated by a slash (/)
    /// </summary>
    Slash,
    /// <summary>
    /// The parts of the date are separated by a dot (.)
    /// </summary>
    Dot,
    /// <summary>
    /// The parts of the date are separated by a hyphen (-)
    /// </summary>
    Hyphen
}

Next we’re going to define the format of the date (without separator characters). This basically states the order of the different parts of the date so DateFormat.MonthYearDay corresponds to MMYYYYDD.

/// <summary>
/// This enumeration tells you the date format that you want to use.
/// </summary>
public enum DateFormat
{
    /// <summary>
    /// Formatted as YYYYMMDD
    /// </summary>
    YearMonthDay,
    /// <summary>
    /// Formatted as MMYYYYDD
    /// </summary>
    MonthYearDay,
    /// <summary>
    /// Formatted as DDMMYYYY
    /// </summary>
    DayMonthYear,
    /// <summary>
    /// Formatted as YYMMDD
    /// </summary>
    ShortYearMonthDay,
    /// <summary>
    /// Formatted as MMYYDD
    /// </summary>
    ShortMonthYearDay,
    /// <summary>
    /// Formatted as DDMMYY
    /// </summary>
    ShortDayMonthYear,
}

Finally we’re going to provide the methods to actually parse the date. Here there are 3 versions of the IsValidDate, with suitable overrides.

/// <summary>
/// Checks to see if this is a valid date.
/// </summary>
/// <param name="date">The date to check.</param>
/// <param name="separator">The <see cref="DateSeparator"/> for this date.</param>
/// <returns>True if the date is valid, false otherwise.</returns>
public static bool IsValidDate(string date, DateSeparator separator)
{
    return IsValidDate(date, separator, DateFormat.YearMonthDay);
} /// <summary>
/// Checks to see if this is a valid date.
/// </summary>
/// <param name="date">The date to check.</param>
/// <param name="separator">The <see cref="DateSeparator"/> for this date.</param>
/// <param name="format">The <see cref="DateFormat"/> for this date.</param>
/// <returns>True if the date is valid, false otherwise.</returns>
public static bool IsValidDate(string date, 
    DateSeparator separator, 
    DateFormat format)
{
    List<int> yearConstrain = new List<int>();
    yearConstrain.Add(19);
    yearConstrain.Add(20);
    return IsValidDate(date, separator, format, yearConstrain);
} /// <summary>
/// Checks to see if this is a valid date.
/// </summary>
/// <param name="date">The date to check.</param>
/// <param name="separator">The <see cref="DateSeparator"/> for this date.</param>
/// <param name="format">The <see cref="DateFormat"/> for this date.</param>
/// <param name="yearConstrain">Any century periods that the 
/// date needs constraining to.</param>
/// <returns>True if the date is valid, false otherwise.</returns>
public static bool IsValidDate(string date, 
    DateSeparator separator, 
    DateFormat format, 
    List<int> yearConstrain)
{
    if (string.IsNullOrEmpty(date))
        throw new ArgumentNullException("date");
    // Ensure that we have some base periods to check for "long" date formats.
    if (format == DateFormat.DayMonthYear || 
        format == DateFormat.MonthYearDay || 
        format == DateFormat.YearMonthDay)
    {
        if (yearConstrain == null)
            yearConstrain = new List<int>();
        if (yearConstrain.Count == 0)
        {
            yearConstrain.Add(19);
            yearConstrain.Add(20);
        }
    }     string dayFormat = "(0[1-9]|[12][0-9]|3[01])";
    string monthFormat = "(0[1-9]|1[012])";
    StringBuilder years = new StringBuilder();
    if (yearConstrain != null && yearConstrain.Count > 0)
    {
        foreach (int i in yearConstrain)
        {
            years.AppendFormat("{0}|", i);
        }
    }
    string yearFormatLong = string.Format(@"({0})\d\d", 
        years.ToString().Substring(0, years.ToString().Length -1));
    string yearFormatShort = @"\d\d";
    string baseFormat = "{0}[{3}]{1}[{3}]{2}";
    int yearPos = 0;
    int yearLength = 4;
    int monthPos = 0;
    int dayPos = 0;
    string sep = string.Empty;
    switch (separator)
    {
        case DateSeparator.Dot:
            sep = ".";
            break;
        case DateSeparator.Hyphen:
            sep = "-";
            break;
        case DateSeparator.Slash:
            sep = "/";
            break;
        case DateSeparator.Space:
            sep = " ";
            break;
    }
    switch (format)
    {
        case DateFormat.DayMonthYear:
            dayPos = 0;
            monthPos = 1;
            yearPos = 2;
            baseFormat = string.Format(baseFormat, 
                dayFormat, 
                monthFormat, 
                yearFormatLong, 
                sep);
            break;
        case DateFormat.MonthYearDay:
            monthPos = 0;
            yearPos = 1;
            dayPos = 2;
            baseFormat = string.Format(baseFormat, 
                monthFormat, 
                yearFormatLong, 
                dayFormat, 
                sep);
            break;
        case DateFormat.ShortDayMonthYear:
            dayPos = 0;
            monthPos = 1;
            yearPos = 2;
            baseFormat = string.Format(baseFormat, 
                dayFormat, 
                monthFormat, 
                yearFormatShort, 
                sep);
            break;
        case DateFormat.ShortMonthYearDay:
            monthPos = 0;
            yearPos = 1;
            dayPos = 2;
            baseFormat = string.Format(baseFormat, 
                monthFormat, 
                yearFormatShort, 
                dayFormat, 
                sep);
            break;
        case DateFormat.ShortYearMonthDay:
            yearPos = 0;
            monthPos = 1;
            dayPos = 2;
            baseFormat = string.Format(baseFormat, 
                yearFormatShort, 
                monthFormat, 
                dayFormat, 
                sep);
            break;
        case DateFormat.YearMonthDay:
            yearPos = 0;
            monthPos = 1;
            dayPos = 2;
            baseFormat = string.Format(baseFormat, 
                yearFormatLong, 
                monthFormat, 
                dayFormat, 
                sep);
            break;
    }
    Regex regex = new Regex(baseFormat, RegexOptions.IgnoreCase
        | RegexOptions.Multiline
        | RegexOptions.IgnorePatternWhitespace);
    Match m = regex.Match(date);
    bool success = (m.Success);
    // Now, we need to do a little bit more processing 
    // (taking care of invalid dates).
    if (success)
    {
        char sepChar = char.Parse(sep);
        string[] splitDate = date.Split(sepChar) ;
        int day = int.Parse(splitDate[dayPos]);
        int month = int.Parse(splitDate[monthPos]);
        int year = int.Parse(splitDate[yearPos]);
        if (yearLength == 2)
            year = 2000 + year;         if (month == 2)
        {
            if (day > 29)
                success = false;
            else
            {
                if (day == 29 && !(year % 4 == 0 && 
                    (year % 100 != 0 || year % 400 == 0)))
                    success = false;
            }
        }
        else
        {
            if (day > 30)
            {
                // The following months can only have 30 days.
                if (month == 9 || month == 4 || month == 6 || month == 11)
                    success = false;
            }
        }
    }
    return success;
}

The first thing that we do is check the input values to make sure they look valid. If no year constraints are applied then we default to the 20th and 21st century (Gregorian Calendar based).

    if (string.IsNullOrEmpty(date))
        throw new ArgumentNullException("date");
    // Ensure that we have some base periods to check for "long" date formats.
    if (format == DateFormat.DayMonthYear || 
        format == DateFormat.MonthYearDay || 
        format == DateFormat.YearMonthDay)
    {
        if (yearConstrain == null)
            yearConstrain = new List<int>();
        if (yearConstrain.Count == 0)
        {
            yearConstrain.Add(19);
            yearConstrain.Add(20);
        }
    }

The next section is where we start to flesh out what we are actually going to use as the regular expression. We constrain the day format to be from 01 through to 31.

    string dayFormat = "(0[1-9]|[12][0-9]|3[01])";

The month format is constrained to 01 through to 12.

    string monthFormat = "(0[1-9]|1[012])";

Next we loop through the year constraints. This is where things become a little bit more complicated. Because we can cater for both long and short date formats, we need to put together constraints that suit both.

    StringBuilder years = new StringBuilder();
    if (yearConstrain != null && yearConstrain.Count > 0)
    {
        foreach (int i in yearConstrain)
        {
            years.AppendFormat("{0}|", i);
        }
    }
    string yearFormatLong = string.Format(@"({0})\d\d", 
        years.ToString().Substring(0, years.ToString().Length -1));
    string yearFormatShort = @"\d\d";

Then we build up the format of the search that we will use. The base format consists of a number of placeholders that will be replaced in the regular expression. The {3} is the separator characters, and the other parts are the day, month and year values that will be put in in the order specified in the DateFormat.

    string baseFormat = "{0}[{3}]{1}[{3}]{2}";

You may notice that we do a little bit more checking once we get a match back. This final check ensures that nonsense dates such as 30th February and invalid leap years don’t pass through uncaught. Finally, to call it you can use it like this:

    string testdate = "2100/02/29";
    List<int> list = new List<int>();
    list.Add(21);
    if (!IsValidDate(testdate, DateSeparator.Slash, DateFormat.YearMonthDay, list))
    {
      Console.WriteLine("This is an invalid date.");
    }

I hope that this whets your appetite for delving into Regular Expressions and shows you how powerful they can be when combined with other coding techniques to build up a more flexible system. I know you can use DateTime.Parse to do a lot of what I have just shown, but this was intended to demonstrate composition of regular expressions.

Finally, I promised that we’d revisit the grab tags regex. Basically it allows you to grab tags regardless of whether or not it has nested < characters.

Regular Expressions revisited.

Well – it’s time to visit Regex land again. Thanks to Mustafa for giving me the push I needed to continue with our journey into Regular Expressions. The following regular expression is a one that I find handy from time to time to identify text that occur near to each other. For instance, suppose I want to find all of the instances of time in this paragraph where there is another mention of time inside 10 words. This expression allows me to do this easily, and would be called as MatchCollection coll = FindNear(<<paragraph text>>, “Time”, “Time”, 1, 10).

One little feature of this, is that you can set the minimum number of words that the text must be apart as well.

Anyway – without further ado, here’s the FindNear function.

/// <summary>
/// Using this method, you can find instances of a particular word near other text. 
/// For instance, you can find Dr near Who when the words occur near each other.
/// This is achieved by constraining the distance of the words that must be between the
/// instances.
/// </summary>
/// <param name="text">The text to search.</param>
/// <param name="findText">The text to find.</param>
/// <param name="nearText">The text to find the text near.</param>
/// <param name="minWords">The minimum number of words the two words can be apart.</param>
/// <param name="maxWords">The maximum number of words the two words can be apart.</param>
/// <returns>A match collection containing the find results.</returns>
public static MatchCollection FindNear(string text, 
    string findText, 
    string nearText, 
    int minWords, 
    int maxWords)
{
    if (string.IsNullOrEmpty(text))
        throw new ArgumentNullException("text");
    if (string.IsNullOrEmpty(findText))
        throw new ArgumentNullException("findText");
    if (string.IsNullOrEmpty(nearText))
        throw new ArgumentNullException("nearText");
    if (minWords > maxWords)
        throw new ArgumentOutOfRangeException("minWords");
    if (maxWords == 0)
        throw new ArgumentOutOfRangeException("maxWords");
    string reg = @"\b" + findText + 
                    @"\W+(?:\w+\W+){" + 
                    minWords + 
                    "," + 
                    maxWords +"}?" + 
                    nearText + @"\b";
    Regex regex = new Regex(reg,
        RegexOptions.IgnoreCase
        | RegexOptions.Multiline
        | RegexOptions.IgnorePatternWhitespace
        );
    return regex.Matches(text);
}

So, how does it work? Well, it builds up the following regular expression (in the case of the above example):

\bTime\W+(?:\w+\W+){1,10}?Time\b

The “magic” part is the bit {1,10} which tells the expression how many words can exist between the words you are searching for. In this case it’s from 1 to 10 words.

I haven’t forgotten the first part of the series on regular expressions – we’ll come back to that one in the next installment when I cover a different way to handle dates. In the meantime, have fun playing around with this regular expression.

Regular Expressions.

A while ago, I considered writing an article containing some common Regular Expressions. For one reason and another, I never got round to writing this article until somebody recently emailed me to ask if I’d done it or not. Well, feeling guilty, I decided to push ahead with the article. Over the next few days, I’ll be posting about my experiences pulling these regular expressions together and discussing how they hang together.

Anyway, the first regular expression is a simple enough one. It simply grabs all of the tags from a piece of HTML like input:

public static MatchCollection GrabTags(string value)
{
    Regex regex = new Regex(
      @"</?\w+((\s+\w+(\s*=\s*(?:"".*?""|'.*?'|[^'"">\s]+))?)+\s*|\s*)/?>",
          RegexOptions.IgnoreCase
          | RegexOptions.Multiline
          | RegexOptions.IgnorePatternWhitespace
          );
    return regex.Matches(value);
}

Tomorrow, we’ll break this one down into its constituent parts and talk about how it all fits together.