I have the following dates in a text file,
04/20/2009;04/20/09;4/20/09;4/3/09;
Mar-20-2009;Mar 20, 2009;March 20, 2009;Mar. 20, 2009;Mar 20 2009;
20 Mar 2009;20 March 2009;20 Mar. 2009;20 March, 2009;
Mar 20th, 2009;Mar 21st, 2009;Mar 22nd, 2009;
Feb 2009; Sep 2009; Oct 2010;
6/2008;12/2009;
2009;2010
I am trying to match the content inline 5 (Feb 2009; Sep 2009; Oct 2010;
) without capturing any of the other dates.
I have written the following regular expression, but its capturing parts of the other dates as well,
expr_5 = re.findall(r'(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*\s\d{4}',date)
out:
Expr list 5 : [(11, ['Mar 2009']), (12, ['March 2009']), (20, ['Feb 2009']), (21, ['Sep 2009']), (22, ['Oct 2010'])]
Note that the number in front of the output is just the index to easily identify the position of the date in the list. How do I get rid of dates index 11 and 12? (They part of the dates from line 3)
Alternatively,
The expression below captures all of the dates on line 3. Is there a way to combine this expression to capture all the dates in line 5 as well (everything from line 3 and line 5)
expr_3 = re.findall(r'\d{2}\s(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[\s.,]*[\s]?\d{4}',date)
out:
Expr list 3 : [(11, ['20 Mar 2009']), (12, ['20 March 2009']), (13, ['20 Mar. 2009']), (14, ['20 March, 2009'])]