1
votes

I need a regex expression to capture everything preceding a colon within the first paragraph only of a multi-paragraph string.

Input1:

Rochester: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum

Output1:

Rochester

Everything that precedes the colon in the first paragraph.

Input2:

Rochester Hills: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum

Output2:

Rochester Hills

Everything that precedes the colon in the first paragraph.

Input3:

Rochester Hills: Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Sisters: sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum

Output3:

Rochester Hills

Only that which precedes the colon in the first paragraph, completely ignoring the string that precedes the colon in a later paragraph.

Input4:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Sisters: sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum

Output 4: Nothing should be captured as a colon does not appear in the first paragraph.

Thanks!

EDIT: Sorry that I forgot about my previous efforts. I had been working with:

(?=.*:[ ]).*?(?=[:][ ])

This works for another instance where the colon never appears in later paragraphs. But for this case I did not understand how I could modify it to only look within the first paragraph.

1
What programming environment? Most regex tools are terrible with new lines...Willem Van Onsem
@CommuSoft This is for a regex expression within the web scraping software Mozenda.chewyg

1 Answers

1
votes

This should get your expected results...

^[^\n\r]+(?=:)

starts with any amount of anything up until a newline/return if followed by a colon. tested this gives desired results for all of your examples