I'm trying to capture specific pattern from large text document. This pattern is quite simple - if the line begins with a word and end with the same word, I want to capture that line. For example:
phase1 begin trial end phase1
phase2.begin distribution end phase2
phase3 allow buying in phase3 but
phase4 has no end
phase5 is test of phase
In this document I would expect to get match on line 1 and line 2, since both lines begin and and with the same word [a-zA-Z0-9], line 3 should not be matched because it does not end with the same word (although it has the same word in the string), line 4 and 5 does not even have the first word in the line at all. I tried using pattern:
^([a-zA-Z0-9]*\b)(.+)(\b\1)$
It should have forced string to end after backreference, but instead it matched on on all five lines (does not match groups but has a full match for each line). I think I am missing some fundamental understanding of regex since I cannot understand how to force it to match this specific pattern, it would be helpful if someone could explain me the flaw in my thinking.
I have tried to look for this pattern but mostly people try to match known words, the complication here is that I want to match any line as long as it starts with arbitrary word and ends with it (as in example there might be N number of phases or any other arbitrary word written in the document). I am using regex101 to test my pattern match.
^([a-zA-Z0-9]+)\b(.+)\b(\1)$regex101.com/r/Y2uHdt/1 - The fourth bird