0
votes

I am trying to put together a regex statement to match on each of the below date formats.

* Mar 7, 2017
Mar. 7, 2017
* March 7, 2017
3-7-2017
03-07-2017
3-7-17
03-07-17
* 03/7/2017
* 03/07/17
* 3/7/17
Mar-07-2017
Mar-7-2017
March-07-2017

The below regex matches on the date formats that are indicated by an asterisk above. I have tried in vain to add to what I already have but have been unsuccessful.

([0-9]+)/([0-9]+)/([0-9]+)|([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))|\w+\s\d{2},\s\d{4}|(?i)\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec](?:ember)?)\b (?:0?[1-9]|[1-2][0-9]|3[01]),? \d{4}

Any help is always appreciated!

* Bonus question *

On some occasions, there may be multiple date matches and I need it to find a match following a certain word. In the past I've used the below syntax by enclosing the regex statement between the parenthesis after the period.

(?<=Word).(StatementHere)
1
What about the (standard) date format 7/3/2017? You're excluding the majority of the world if you don't also accept D / M / Y.Obsidian Age
While that may be true, I only need it to match in the formats I listed. If there is a statement that can cover the format you listed, I will gladly accept that as well :). It just isn't required for my need.JadonR
Are you using PCRE (php), or Perl or Ruby ?user557597
Not sure how to answer that as I don't necessarily use any of those. I copy the content into Notepad++ and then use the "Find" and set the search mode to use Regular Expressions.JadonR
Notepad++ uses PCRE, but it doesn't matter. I was just going to optimize the regex with some functions.user557597

1 Answers

2
votes

Try this then ...

([0-9]+)/([0-9]+)/([0-9]+)|((0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])-(\d{4}|\d{2}))|\w+\s\d{2},\s\d{4}|(?i)\b(Jan(?:uary|\.)?|Feb(?:ruary|\.)?|Mar(?:ch|\.)?|Apr(?:il|\.)?|May|Jun(?:e|\.)?|Jul(?:y|\.)?|Aug(?:ust|\.)?|Sep(?:tember|\.)?|Oct(?:ober|\.)?|Nov(?:ember|\.)?|Dec(?:ember|\.)?)([ ](?:0?[1-9]|[1-2][0-9]|3[01]),?[ ]|-(?:0?[1-9]|[1-2][0-9]|3[01])-)(\d{4})

https://regex101.com/r/k1vaVN/1

Readable version

    ( [0-9]+ )                    # (1)
    /
    ( [0-9]+ )                    # (2)
    /
    ( [0-9]+ )                    # (3)
 |  
    (                             # (4 start)
         ( 0? [1-9] | 1 [0-2] )        # (5)
         -
         ( 0? [1-9] | [12] \d | 3 [01] )  # (6)
         -
         ( \d{4} | \d{2} )             # (7)
    )                             # (4 end)
 |  
    \w+ \s \d{2} , \s \d{4} 
 |  
    (?i)
    \b 
    (                             # (8 start)
         Jan
         (?: uary | \. )?
      |  Feb
         (?: ruary | \. )?
      |  Mar
         (?: ch | \. )?
      |  Apr
         (?: il | \. )?
      |  May
      |  Jun
         (?: e | \. )?
      |  Jul
         (?: y | \. )?
      |  Aug
         (?: ust | \. )?
      |  Sep
         (?: tember | \. )?
      |  Oct
         (?: ober | \. )?
      |  Nov
         (?: ember | \. )?
      |  Dec
         (?: ember | \. )?
    )                             # (8 end)
    (                             # (9 start)
         [ ] 
         (?: 0? [1-9] | [1-2] [0-9] | 3 [01] )
         ,? [ ] 
      |  -
         (?: 0? [1-9] | [1-2] [0-9] | 3 [01] )
         -
    )                             # (9 end)
    ( \d{4} )                     # (10)

update
Just wrap the dates in a (?: ) group, then add whatever qualifier before
it that you need.

word[ ]or[ ]phrase[ ]+\K(?:([0-9]+)/([0-9]+)/([0-9]+)|((0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])-(\d{4}|\d{2}))|\w+\s\d{2},\s\d{4}|(?i)\b(Jan(?:uary|\.)?|Feb(?:ruary|\.)?|Mar(?:ch|\.)?|Apr(?:il|\.)?|May|Jun(?:e|\.)?|Jul(?:y|\.)?|Aug(?:ust|\.)?|Sep(?:tember|\.)?|Oct(?:ober|\.)?|Nov(?:ember|\.)?|Dec(?:ember|\.)?)([ ](?:0?[1-9]|[1-2][0-9]|3[01]),?[ ]|-(?:0?[1-9]|[1-2][0-9]|3[01])-)(\d{4}))