
My regex:


Test string:

联系人: 啊啊,实打实大, 好说歹说、实打实  实打实大


>>> import regex as re
>>> p = r'联系人[::]\s*([^\s,,、]+)(?:[\s,,、]{1,2}([^\s,,、]+))*'
>>> s = '联系人: 啊啊,实打实大, 好说歹说、实打实  实打实大'
>>> re.findall(p, s)
[('啊啊', '实打实大')]

#  finditer
>>> for i in re.finditer(p, s):
...     print(i.groups())
('啊啊', '实打实大')


enter image description here

enter image description here

You can test it here https://regex101.com/ (regex101 can't save regex now, so I have to post above pics)

I want all groups split by [\s,,、], but only match the first and last. I don't feel there is any wrong in my regex, though the result is wrong, this stuck me for half hour...

It is not possible to keep repeated captures with Python re, you can access them with PyPi regex though.Wiktor Stribiżew
@Wiktor Stribiżew I have tried regex, unfortunately same result. See my sample code.Mithril
You did not use it correctly. Use regex.search if you expect a single match or regex.finditer to get multiple matches, and then access the corresponding group's captures. See RegEx: Find all digits after certain string.Wiktor Stribiżew
It looks like you are using Python 3.x, right?Wiktor Stribiżew
@Wiktor Stribiżew Sorry, I fogot to provide the version, it is Python 3.6. And finditer is just more memory efficient than findall , the results are same.Mithril

1 Answers


As I mentioned in my comments, you need to use re.search (to get a single match only) or re.finditer (to get multiple matches) and access the corresponding group captures (in your case, it is captures(2)):

>>> import regex as re
>>> p = r'联系人[::]\s*([^\s,,、]+)(?:[\s,,、]{1,2}([^\s,,、]+))*'
>>> s = '联系人: 啊啊,实打实大, 好说歹说、实打实  实打实大'
>>> res = []
>>> for x in re.finditer(p, s):

>>> print(res)
[['实打实大', '好说歹说', '实打实', '实打实大']]

>>> m = re.search(p, s)
>>> if m:
['实打实大', '好说歹说', '实打实', '实打实大']