1
votes

I am trying to write a RegEx statement to locate the first date BEFORE a specific word.

I've used the below Regex to show the first date AFTER a specific word.

Word +\K(?:([0-9]+)/([0-9]+)/([0-9]+)|((0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])-(\d{4}|\d{2}))|\w+\s\d{2},\s\d{4}|(?i)\b(Jan(?:uary|.)?|Feb(?:ruary|.)?|Mar(?:ch|.)?|Apr(?:il|.)?|May|Jun(?:e|.)?|Jul(?:y|.)?|Aug(?:ust|.)?|Sep(?:tember|.)?|Oct(?:ober|.)?|Nov(?:ember|.)?|Dec(?:ember|.)?)( ,?[ ]|-(?:0?[1-9]|[1-2][0-9]|3[01])-)(\d{4}))

Here is an example of what I want it to return.

Many words here 01/07/2019 02/03/2019 02/08/2019 More words here. In this case it should return the date 02/08/2019. How can I change the above statement to locate a date BEFORE a specified word?

I use Notepad ++ to test if that helps determine what type of RegEx I use.

Bonus question: sometimes the word to match on may be on a new line. Can regex still match on that? For example it may be formatted as shown below where the word "More" is on a new line:

Many words here 
01/07/2019 
02/03/2019 
02/08/2019 
More words here
1

1 Answers

2
votes

You could use a positive lookahead (?=\h+More\b) at the end of your date like pattern to assert what follows is 1+ times a horizontal whitespace char followed by Word and a word boundary.

(?:([0-9]+)/([0-9]+)/([0-9]+)|((0?[1-9]|1[0-2])-(0?[1-9]|[12]\d|3[01])-(\d{4}|\d{2}))|\w+\s\d{2},\s\d{4}|(?i)\b(Jan(?:uary|.)?|Feb(?:ruary|.)?|Mar(?:ch|.)?|Apr(?:il|.)?|May|Jun(?:e|.)?|Jul(?:y|.)?|Aug(?:ust|.)?|Sep(?:tember|.)?|Oct(?:ober|.)?|Nov(?:ember|.)?|Dec(?:ember|.)?)( ,?[ ]|-(?:0?[1-9]|[1-2][0-9]|3[01])-)(\d{4}))(?=\h+More\b)

Regex demo

If the word can be on a newline you could change \h to \s

Regex demo