Regular Expressions matching dates (greedy)

Question

I have the following dates in a text file,

04/20/2009;04/20/09;4/20/09;4/3/09;

Mar-20-2009;Mar 20, 2009;March 20, 2009;Mar. 20, 2009;Mar 20 2009;

20 Mar 2009;20 March 2009;20 Mar. 2009;20 March, 2009;

Mar 20th, 2009;Mar 21st, 2009;Mar 22nd, 2009;

Feb 2009; Sep 2009; Oct 2010;

6/2008;12/2009;

2009;2010

I am trying to match the content inline 5 (Feb 2009; Sep 2009; Oct 2010;) without capturing any of the other dates.

I have written the following regular expression, but its capturing parts of the other dates as well,

expr_5 = re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\s\d{4}',date)

out:

Expr list 5 : [(11, ['Mar 2009']), (12, ['March 2009']), (20, ['Feb 2009']), (21, ['Sep 2009']), (22, ['Oct 2010'])]

Note that the number in front of the output is just the index to easily identify the position of the date in the list. How do I get rid of dates index 11 and 12? (They part of the dates from line 3)

Alternatively,

The expression below captures all of the dates on line 3. Is there a way to combine this expression to capture all the dates in line 5 as well (everything from line 3 and line 5)

expr_3 = re.findall(r'\d{2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[\s.,]*[\s]?\d{4}',date)

out:

Expr list 3 : [(11, ['20 Mar 2009']), (12, ['20 March 2009']), (13, ['20 Mar. 2009']), (14, ['20 March, 2009'])]

Nezuko Nezuko · Accepted Answer · 2020-08-27T05:38:49

Try this one.

import re


s = """
04/20/2009;04/20/09;4/20/09;4/3/09;

Mar-20-2009;Mar 20, 2009;March 20, 2009;Mar. 20, 2009;Mar 20 2009;

20 Mar 2009;20 March 2009;20 Mar. 2009;20 March, 2009;

Mar 20th, 2009;Mar 21st, 2009;Mar 22nd, 2009;

Feb 2009; Sep 2009; Oct 2010;

6/2008;12/2009;

2009;2010
"""


reg = re.compile(r"(^|; )\w{3} \d{4}", re.M)
match = ''.join([m.group() for m in reg.finditer(s)])

# gives you the matched string
print(match)

# If you just want to get the dates
dates = match.split('; ')
print(*dates, sep='\n')

Here in the regex pattern, I used \w{3} which matches the words with 3 letters preceded by either a ^ (newline) or the ; .

Regular Expressions matching dates (greedy)

4 Answers