Return to Regex.

Continuing with our series on regular expressions, here’s a more complex example of regular expression testing. In this example, we’re going to see how to check a date that comes in different formats. The extra wrinkle that we are providing here is that we’re constraining the date so that it can only be in certain centuries.

First of all, we’re going to define how the date is separated out. For instance, if we use DateSeparator.Slash, then the date will be separated by the / character.

/// <summary>
/// Describes how the date is separated.
/// </summary>
public enum DateSeparator
{
    /// <summary>
    /// The parts of the date are separated by a space.
    /// </summary>
    Space,
    /// <summary>
    /// The parts of the date are separated by a slash (/)
    /// </summary>
    Slash,
    /// <summary>
    /// The parts of the date are separated by a dot (.)
    /// </summary>
    Dot,
    /// <summary>
    /// The parts of the date are separated by a hyphen (-)
    /// </summary>
    Hyphen
}

Next we’re going to define the format of the date (without separator characters). This basically states the order of the different parts of the date so DateFormat.MonthYearDay corresponds to MMYYYYDD.

/// <summary>
/// This enumeration tells you the date format that you want to use.
/// </summary>
public enum DateFormat
{
    /// <summary>
    /// Formatted as YYYYMMDD
    /// </summary>
    YearMonthDay,
    /// <summary>
    /// Formatted as MMYYYYDD
    /// </summary>
    MonthYearDay,
    /// <summary>
    /// Formatted as DDMMYYYY
    /// </summary>
    DayMonthYear,
    /// <summary>
    /// Formatted as YYMMDD
    /// </summary>
    ShortYearMonthDay,
    /// <summary>
    /// Formatted as MMYYDD
    /// </summary>
    ShortMonthYearDay,
    /// <summary>
    /// Formatted as DDMMYY
    /// </summary>
    ShortDayMonthYear,
}

Finally we’re going to provide the methods to actually parse the date. Here there are 3 versions of the IsValidDate, with suitable overrides.

/// <summary>
/// Checks to see if this is a valid date.
/// </summary>
/// <param name="date">The date to check.</param>
/// <param name="separator">The <see cref="DateSeparator"/> for this date.</param>
/// <returns>True if the date is valid, false otherwise.</returns>
public static bool IsValidDate(string date, DateSeparator separator)
{
    return IsValidDate(date, separator, DateFormat.YearMonthDay);
} /// <summary>
/// Checks to see if this is a valid date.
/// </summary>
/// <param name="date">The date to check.</param>
/// <param name="separator">The <see cref="DateSeparator"/> for this date.</param>
/// <param name="format">The <see cref="DateFormat"/> for this date.</param>
/// <returns>True if the date is valid, false otherwise.</returns>
public static bool IsValidDate(string date, 
    DateSeparator separator, 
    DateFormat format)
{
    List<int> yearConstrain = new List<int>();
    yearConstrain.Add(19);
    yearConstrain.Add(20);
    return IsValidDate(date, separator, format, yearConstrain);
} /// <summary>
/// Checks to see if this is a valid date.
/// </summary>
/// <param name="date">The date to check.</param>
/// <param name="separator">The <see cref="DateSeparator"/> for this date.</param>
/// <param name="format">The <see cref="DateFormat"/> for this date.</param>
/// <param name="yearConstrain">Any century periods that the 
/// date needs constraining to.</param>
/// <returns>True if the date is valid, false otherwise.</returns>
public static bool IsValidDate(string date, 
    DateSeparator separator, 
    DateFormat format, 
    List<int> yearConstrain)
{
    if (string.IsNullOrEmpty(date))
        throw new ArgumentNullException("date");
    // Ensure that we have some base periods to check for "long" date formats.
    if (format == DateFormat.DayMonthYear || 
        format == DateFormat.MonthYearDay || 
        format == DateFormat.YearMonthDay)
    {
        if (yearConstrain == null)
            yearConstrain = new List<int>();
        if (yearConstrain.Count == 0)
        {
            yearConstrain.Add(19);
            yearConstrain.Add(20);
        }
    }     string dayFormat = "(0[1-9]|[12][0-9]|3[01])";
    string monthFormat = "(0[1-9]|1[012])";
    StringBuilder years = new StringBuilder();
    if (yearConstrain != null && yearConstrain.Count > 0)
    {
        foreach (int i in yearConstrain)
        {
            years.AppendFormat("{0}|", i);
        }
    }
    string yearFormatLong = string.Format(@"({0})\d\d", 
        years.ToString().Substring(0, years.ToString().Length -1));
    string yearFormatShort = @"\d\d";
    string baseFormat = "{0}[{3}]{1}[{3}]{2}";
    int yearPos = 0;
    int yearLength = 4;
    int monthPos = 0;
    int dayPos = 0;
    string sep = string.Empty;
    switch (separator)
    {
        case DateSeparator.Dot:
            sep = ".";
            break;
        case DateSeparator.Hyphen:
            sep = "-";
            break;
        case DateSeparator.Slash:
            sep = "/";
            break;
        case DateSeparator.Space:
            sep = " ";
            break;
    }
    switch (format)
    {
        case DateFormat.DayMonthYear:
            dayPos = 0;
            monthPos = 1;
            yearPos = 2;
            baseFormat = string.Format(baseFormat, 
                dayFormat, 
                monthFormat, 
                yearFormatLong, 
                sep);
            break;
        case DateFormat.MonthYearDay:
            monthPos = 0;
            yearPos = 1;
            dayPos = 2;
            baseFormat = string.Format(baseFormat, 
                monthFormat, 
                yearFormatLong, 
                dayFormat, 
                sep);
            break;
        case DateFormat.ShortDayMonthYear:
            dayPos = 0;
            monthPos = 1;
            yearPos = 2;
            baseFormat = string.Format(baseFormat, 
                dayFormat, 
                monthFormat, 
                yearFormatShort, 
                sep);
            break;
        case DateFormat.ShortMonthYearDay:
            monthPos = 0;
            yearPos = 1;
            dayPos = 2;
            baseFormat = string.Format(baseFormat, 
                monthFormat, 
                yearFormatShort, 
                dayFormat, 
                sep);
            break;
        case DateFormat.ShortYearMonthDay:
            yearPos = 0;
            monthPos = 1;
            dayPos = 2;
            baseFormat = string.Format(baseFormat, 
                yearFormatShort, 
                monthFormat, 
                dayFormat, 
                sep);
            break;
        case DateFormat.YearMonthDay:
            yearPos = 0;
            monthPos = 1;
            dayPos = 2;
            baseFormat = string.Format(baseFormat, 
                yearFormatLong, 
                monthFormat, 
                dayFormat, 
                sep);
            break;
    }
    Regex regex = new Regex(baseFormat, RegexOptions.IgnoreCase
        | RegexOptions.Multiline
        | RegexOptions.IgnorePatternWhitespace);
    Match m = regex.Match(date);
    bool success = (m.Success);
    // Now, we need to do a little bit more processing 
    // (taking care of invalid dates).
    if (success)
    {
        char sepChar = char.Parse(sep);
        string[] splitDate = date.Split(sepChar) ;
        int day = int.Parse(splitDate[dayPos]);
        int month = int.Parse(splitDate[monthPos]);
        int year = int.Parse(splitDate[yearPos]);
        if (yearLength == 2)
            year = 2000 + year;         if (month == 2)
        {
            if (day > 29)
                success = false;
            else
            {
                if (day == 29 && !(year % 4 == 0 && 
                    (year % 100 != 0 || year % 400 == 0)))
                    success = false;
            }
        }
        else
        {
            if (day > 30)
            {
                // The following months can only have 30 days.
                if (month == 9 || month == 4 || month == 6 || month == 11)
                    success = false;
            }
        }
    }
    return success;
}

The first thing that we do is check the input values to make sure they look valid. If no year constraints are applied then we default to the 20th and 21st century (Gregorian Calendar based).

    if (string.IsNullOrEmpty(date))
        throw new ArgumentNullException("date");
    // Ensure that we have some base periods to check for "long" date formats.
    if (format == DateFormat.DayMonthYear || 
        format == DateFormat.MonthYearDay || 
        format == DateFormat.YearMonthDay)
    {
        if (yearConstrain == null)
            yearConstrain = new List<int>();
        if (yearConstrain.Count == 0)
        {
            yearConstrain.Add(19);
            yearConstrain.Add(20);
        }
    }

The next section is where we start to flesh out what we are actually going to use as the regular expression. We constrain the day format to be from 01 through to 31.

    string dayFormat = "(0[1-9]|[12][0-9]|3[01])";

The month format is constrained to 01 through to 12.

    string monthFormat = "(0[1-9]|1[012])";

Next we loop through the year constraints. This is where things become a little bit more complicated. Because we can cater for both long and short date formats, we need to put together constraints that suit both.

    StringBuilder years = new StringBuilder();
    if (yearConstrain != null && yearConstrain.Count > 0)
    {
        foreach (int i in yearConstrain)
        {
            years.AppendFormat("{0}|", i);
        }
    }
    string yearFormatLong = string.Format(@"({0})\d\d", 
        years.ToString().Substring(0, years.ToString().Length -1));
    string yearFormatShort = @"\d\d";

Then we build up the format of the search that we will use. The base format consists of a number of placeholders that will be replaced in the regular expression. The {3} is the separator characters, and the other parts are the day, month and year values that will be put in in the order specified in the DateFormat.

    string baseFormat = "{0}[{3}]{1}[{3}]{2}";

You may notice that we do a little bit more checking once we get a match back. This final check ensures that nonsense dates such as 30th February and invalid leap years don’t pass through uncaught. Finally, to call it you can use it like this:

    string testdate = "2100/02/29";
    List<int> list = new List<int>();
    list.Add(21);
    if (!IsValidDate(testdate, DateSeparator.Slash, DateFormat.YearMonthDay, list))
    {
      Console.WriteLine("This is an invalid date.");
    }

I hope that this whets your appetite for delving into Regular Expressions and shows you how powerful they can be when combined with other coding techniques to build up a more flexible system. I know you can use DateTime.Parse to do a lot of what I have just shown, but this was intended to demonstrate composition of regular expressions.

Finally, I promised that we’d revisit the grab tags regex. Basically it allows you to grab tags regardless of whether or not it has nested < characters.

Advertisements

2 thoughts on “Return to Regex.

  1. As usual, you’ve done a good job 🙂 I’ve said it before, but I’ll say it again, I’m in love with the way you document your code! I just wish that others would be this willing in doing half as good a job.

    Can’t wait for the grab tags.

    Keep ’em coming 😀

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s