There are many considerations to bake into this task.
- Do you need word boundaries for your target substring? Without them you may perform unintentional matching; but only you can decide this point for your project.
- Do you need case-insensitive matching? I'll guess: yes.
- What happens if the target substring is the second last or last word in the string? Do you want to allow one or zero words to be omitted? I'll guess: yes.
- You need to consider/include punctuation, right? I'll guess: yes.
- Might your target substring contain regex-sensitive characters? If so,
preg_quote() is recommended. I'll guess: no, but you can call preg_quote() on the needle before injecting it into the pattern if you are unsure.
Here is a complete battery of needles: (Demo)
$txt = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.';
$needles = str_word_count(strtolower($txt),1);
foreach($needles as $needle){
echo "($needle) => ",preg_replace('~\b'.$needle.'\b\S*\K(?:\s\S+){0,2}~i','',$txt),"\n"; // use '(($0))'' as replacement to see the substring that is removed
}
Output:
(lorem) => Lorem sit amet, consetetur sadipscing elitr, sed diam.
(ipsum) => Lorem ipsum amet, consetetur sadipscing elitr, sed diam.
(dolor) => Lorem ipsum dolor consetetur sadipscing elitr, sed diam.
(sit) => Lorem ipsum dolor sit sadipscing elitr, sed diam.
(amet) => Lorem ipsum dolor sit amet, elitr, sed diam.
(consetetur) => Lorem ipsum dolor sit amet, consetetur sed diam.
(sadipscing) => Lorem ipsum dolor sit amet, consetetur sadipscing diam.
(elitr) => Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
(sed) => Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed
(diam) => Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.
Breakdown:
~ #pattern delimiter
\b'.$needle.'\b #match needle as a whole word
\S* #match zero or more trailing character as long as first character is a non-word character. This may be replaced with [[:punct:]]+ if more desirable/accurate
\K #restart fullstring match
(?:\s\S+){0,2} #match zero, one or two sequences of: a whitespace character followed by one or more non-whitespace characters
~ #pattern delimiter
i #case-insensitive pattern modifier