Let's say I have a list of regexes like such (this is a simple example, the real code has more complex regexes):
regs = [r'apple', 'strawberry', r'pear', r'.*berry', r'fruit: [a-z]*']
I want to exactly match one of the regexes above (so ^regex$
) and return the index. Additionally, I want to match the leftmost regex. So find('strawberry')
should return 1 while find('blueberry')
should return 3. I'm going to re-use the same set of regexes a lot, so precomputation is fine.
This is what I've coded, but it feels bad. The regex should be able to know which one got matched, and I feel this is terribly inefficient (keep in mind that the example above is simplified, and the real regexes are more complicated and in larger numbers):
import re
regs_compiled = [re.compile(reg) for reg in regs]
regs_combined = re.compile('^' +
'|'.join('(?:{})'.format(reg) for reg in regs) +
'$')
def find(s):
if re.match(regs_combined, s):
for i, reg in enumerate(regs_compiled):
if re.match(reg, s):
return i
return -1
Is there a way to find out which subexpression(s) were used to match the regex without looping explicitly?