Separating words with Regex (Not in specific order)

Question

Extracting from text For example; the following sentence contains the initial capital letters. How can I separate them?

Text:

A. lorem ipsum dolor sit B . 41dipiscing elit sedC. lorem ipsum dolor sit amet D. 35 Consectetur adipiscing E .Sed do eiusmod tempor

Goal:

A. lorem ipsum dolor sit 
B . 41dipiscing elit sed 
C. lorem ipsum dolor sit amet 
D. 35 Consectetur adipiscing 
E .Sed do eiusmod tempor

What have I done?

^(([a-zA-Z]{1}|[0-9]+)\s*[.,]{1})(.*)$

Result:

https://regex101.com/r/4HB0oD/1

But my Regex code doesn't detect it without first sentence. What is the reason of this?

Note that the quantifier {1} is inherently redundant. If you want to match something once, simply don't add a quantifier. — CAustin

Emma Emma · Accepted Answer · 2019-12-13T18:53:53

Maybe,

(?=[A-Z]\s*\.)

might work OK.

RegEx Demo

Test

import re

string = '''
A. lorem ipsum dolor sit B . 41dipiscing elit sedC. lorem ipsum dolor sit amet D. 35 Consectetur adipiscing E .Sed do eiusmod tempor
'''

print(re.sub(r'(?=[A-Z]\s*\.)', '\n', string))

Output


A. lorem ipsum dolor sit 
B . 41dipiscing elit sed
C. lorem ipsum dolor sit amet 
D. 35 Consectetur adipiscing 
E .Sed do eiusmod tempor

If you wish to simplify/update/explore the expression, it's been explained on the top right panel of regex101.com. You can watch the matching steps or modify them in this debugger link, if you'd be interested. The debugger demonstrates that how a RegEx engine might step by step consume some sample input strings and would perform the matching process.

RegEx Circuit

jex.im visualizes regular expressions: