4
votes

I am getting mail archives with dates like this in it.

Wed, 17 Dec 1997 13:36:23 +2
Mon, 16 Jun 1997 15:41:52 EST
Tue, 15 Jul 1997 14:37:00 EDT
Tue, 5 Aug 1997 08:37:56 PST
Tue, 5 Aug 1997 15:46:16 PDT
Thu, 5 Mar 1998 08:44:19 MET
Mon, 8 Nov 1999 17:49:25 GMT
Thu, 24 Feb 94 20:06:06 MST
Mon, 19 Dec 2005 14:17:06 CST
Thu, 14 Sep 95 19:15 CDT
Sat, 22 Feb 1997 05:16:55 UT
Mon, 8 Jul 1996 15:48:54 GMT-5
Mon, 25 Nov 1996 17:10:28 WET
Mon, 6 Jan 1997 23:43:48 UT
Fri, 13 Jun 1997 16:44:03 -0400

Ask is to convert this time into UTC. This is how I am trying to do this.

static void Main(string[] args)
{
    var possibleValues = new string[] 
    {
        "Mon, 29 Sep 2014 08:33:35 +0200"
        , "Fri, 29 Jun 2001 07:53:01 -0700"
        ,"Fri, 26 Sep 2014 15:57:04 +0000"
        ,"Wed, 17 Dec 1997 13:36:23 +2"
        , "Fri, 13 Jun 1997 16:44:03 -0400"

        , "Mon, 16 Jun 1997 15:41:52 EST"
        , "Tue, 15 Jul 1997 14:37:00 EDT"
        , "Tue, 5 Aug 1997 08:37:56 PST"
        , "Tue, 5 Aug 1997 15:46:16 PDT"
        , "Thu, 5 Mar 1998 08:44:19 MET"
        , "Mon, 8 Nov 1999 17:49:25 GMT"
        , "Thu, 24 Feb 94 20:06:06 MST"
        , "Mon, 19 Dec 2005 14:17:06 CST"
        , "Thu, 14 Sep 95 19:15:00 CDT"
        , "Sat, 22 Feb 1997 05:16:55 UT"
        , "Mon, 8 Jul 1996 15:48:54 GMT-5"
        , "Mon, 25 Nov 1996 17:10:28 WET"
        , "Mon, 6 Jan 1997 23:43:48 UT"

    };

    foreach (var item in possibleValues)
    {
        var dateParts = item.Split(' ');
        var lastItem = dateParts[dateParts.Length - 1];
        if (lastItem.StartsWith("+") || lastItem.StartsWith("-"))
        {
            try
            {
                DateTimeOffset offset = DateTimeOffset.Parse(item, CultureInfo.InvariantCulture);
                Debug.WriteLine("Input: {0}, UTC Time: {1}", item, offset.UtcDateTime);
            }
            catch (Exception exc)
            {
                Debug.WriteLine("Failed - {0}, Error Message: {1}", item, exc.Message);
            }
        }
        else
        {
            //Sometimes year is a two digit number and sometimes it is 4 digit number.
            string dateFormat = string.Format("ddd, {0} MMM {1} {2}:mm:ss {3}", new string('d', dateParts[1].Length), new string('y', dateParts[3].Length), int.Parse(dateParts[4].Substring(0, 2)) > 12 ? "HH" : "hh", lastItem);     
            try
            {
                DateTimeOffset offset = DateTimeOffset.ParseExact(item, dateFormat, CultureInfo.InvariantCulture, DateTimeStyles.None);
                Debug.WriteLine("Input: {0}, UTC Time: {1}", item, offset.UtcDateTime);
            }
            catch (Exception exc)
            {
                Debug.WriteLine("Failed - {0}, DateFormat Tried: {1}, Error Message: {2}", item, dateFormat, exc.Message);
            }
        }
    }
}

I am not able figure out how to handle all the cases. I am open to use Noda time too.

I have gone thru many links from SO and Google to find this answer but wasn't able implement any answer from those links. In case if you know the similar question then please let me know.

I have already gone thru below links.

Convert.ToDateTime Method
Converting between types
daylight-saving-time-and-time-zone-best-practices
SO Tags timezone
Coding Best Practices Using DateTime in the .NET Framework
conversion-of-a-utc-date-time-string-in-c-sharp

1
I have edited your title. Please see, "Should questions include “tags” in their titles?", where the consensus is "no, they should not".John Saunders
@JohnSaunders, thanks I will keep this in mind.ndd
The strings appear to mostly be RFC 822/1123 compliant, with the exception of the time zone abbreviations "WET" and "MET". Also, offsets of the form "GMT-5" and "+2" are not to spec, as that format requires values like +0100".Matt Johnson-Pint
@MattJohnson According to www.worldtimezone.com/wtz-names/wtz-met, MET is Middle-European Time (UTC+01).Andrew Morton
@ndd You could look for non-standard time zone abbreviations and convert them to standard ones. However, note the problem with "CST", as explained by Jon Skeet in Jon Skeet and Tony the Pony (Vimeo video) at 20 minutes in. You could partially resolve that one by checking if the email appears to come from a .com or .au address.Andrew Morton

1 Answers

3
votes

These dates appear to mostly be compliant with RFC 822 §5.1 as amended by RFC 1123 §5.2.14.

However, several of the time zones specified are not compliant.

  • "WET" is usually +0000
  • "MET" is rare, but is shown here as +0100.
  • "GMT-5" should be written as "-0500"
  • "+2" should be written as "+0200"

That format only provides definitions for:

  • "UT" / "GMT" = +0100
  • "EDT" = -0400
  • "EST" / "CDT" = -0500
  • "CST" / "MDT" = -0600
  • "MST" / "PDT" = -0700
  • "PST" = -0800

Note that under normal circumstances, any time zone abbreviation might be ambiguous. For example, there are 5 different meanings of "CST", as you can see in this list. It's only in this particular format that the abbreviation has specific context. In other words, while "CST" is a valid abbreviation for China Standard Time, you would never use CST in an RFC822/1123 formatted value. Instead you would use "+0800".

Now in .NET, the RFC822/1123 format is covered by the "R" standard format specifier. Normally, you could call DateTimeOffset.ParseExact or DateTime.ParseExact with the "R" specifier. However, you won't be able to use that here because it doesn't recognize any time zone abbreviation other than "GMT", nor does it work with offsets or two-digit years.

However, the non-exact parser (DateTimeOffset.Parse or DateTime.Parse) does seem to recognize most of the important bits, and we can take advantage of this. You'll have to do some pre-processing to assign a time zone offset that can be recognized.

private static readonly Dictionary<string,string> TZMap = new Dictionary<string, string>
{
    // Defined by RFC822, but not known to .NET
    {"UT", "+0000"},
    {"EST", "-0500"},
    {"EDT", "-0400"},
    {"CST", "-0600"},
    {"CDT", "-0500"},
    {"MST", "-0700"},
    {"MDT", "-0600"},
    {"PST", "-0800"},
    {"PDT", "-0700"},

    // Extraneous, as found in your data
    {"WET", "+0000"},
    {"MET", "+0100"}
};

public static DateTimeOffset Parse(string s)
{
    // Get the time zone part of the string
    var tz = s.Substring(s.LastIndexOf(' ') + 1);

    // Replace time zones defined in the map
    if (TZMap.ContainsKey(tz))
    {
        s = s.Substring(0, s.Length - tz.Length) + TZMap[tz];
    }

    // Replace time zone offsets with leading characters
    if (tz.StartsWith("GMT+") || tz.StartsWith("GMT-") || tz.StartsWith("UTC+") || tz.StartsWith("UTC-"))
    {
        s = s.Substring(0, s.Length - tz.Length) + tz.Substring(3);
    }
        
    DateTimeOffset dto;
    if (DateTimeOffset.TryParse(s, CultureInfo.InvariantCulture, DateTimeStyles.None, out dto))
    {
        return dto;
    }

    throw new ArgumentException("Could not parse value: " + s);
}

This passes all of the sample values you provided, however you'll probably find many more extraneous values that you'll need to add to the the map. It may take several passes through your data before you identify all of the edge cases.

And of course, since you're getting back a DateTimeOffset here, if you want the UTC value you can use .UtcDateTime, or .ToUniversalTime().