1
votes

My regex:

联系人[::]\s{1,2}([^\s,,、]+)(?:[\s,,、]{1,2}([^\s,,、]+))*

Test string:

联系人: 啊啊,实打实大, 好说歹说、实打实  实打实大

Code

>>> import regex as re
>>> p = r'联系人[::]\s*([^\s,,、]+)(?:[\s,,、]{1,2}([^\s,,、]+))*'
>>> s = '联系人: 啊啊,实打实大, 好说歹说、实打实  实打实大'
>>> re.findall(p, s)
[('啊啊', '实打实大')]

#  finditer
>>> for i in re.finditer(p, s):
...     print(i.groups())
...
('啊啊', '实打实大')

Matchs:

enter image description here

enter image description here

You can test it here https://regex101.com/ (regex101 can't save regex now, so I have to post above pics)


I want all groups split by [\s,,、], but only match the first and last. I don't feel there is any wrong in my regex, though the result is wrong, this stuck me for half hour...

1
It is not possible to keep repeated captures with Python re, you can access them with PyPi regex though.Wiktor Stribiżew
@Wiktor Stribiżew I have tried regex, unfortunately same result. See my sample code.Mithril
You did not use it correctly. Use regex.search if you expect a single match or regex.finditer to get multiple matches, and then access the corresponding group's captures. See RegEx: Find all digits after certain string.Wiktor Stribiżew
It looks like you are using Python 3.x, right?Wiktor Stribiżew
@Wiktor Stribiżew Sorry, I fogot to provide the version, it is Python 3.6. And finditer is just more memory efficient than findall , the results are same.Mithril

1 Answers

1
votes

As I mentioned in my comments, you need to use re.search (to get a single match only) or re.finditer (to get multiple matches) and access the corresponding group captures (in your case, it is captures(2)):

>>> import regex as re
>>> p = r'联系人[::]\s*([^\s,,、]+)(?:[\s,,、]{1,2}([^\s,,、]+))*'
>>> s = '联系人: 啊啊,实打实大, 好说歹说、实打实  实打实大'
>>> res = []
>>> for x in re.finditer(p, s):
    res.append(x.captures(2))

>>> print(res)
[['实打实大', '好说歹说', '实打实', '实打实大']]

>>> m = re.search(p, s)
>>> if m:
    print(m.captures(2))
['实打实大', '好说歹说', '实打实', '实打实大']