PHP remove next two words after a specific word

0

votes

how can I remove the next two words after a specific word with preg_replace in PHP? For example: String: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam. Specific word: ipsum New String: Lorem ipsum amet, consetetur sadipscing elitr, sed diam.

Thats my current code:

$txt = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.
Specific word: ipsum";
$str= preg_replace('/\W\w+\s*(\W*)$/', '$1', $txt);
echo $str;

But is is just removing the last word of a string.

Thank you With best regards

phpregexpreg-replace

Possible duplicate of Remove two words after a specific word - revo

@Floppy Do you want to match punctuation? Do you need case-insensitive matching? Do you want to match whole words only? What if the target word occurs more than once... do you want to make multiple removals? Will your target word be decided by you or will it be from an untrustworthy source? What is your expected result when when there are insufficient words following the target word? - mickmackusa

@Floppy Please respond to my requests for question clarification. - mickmackusa

2

votes

You can use (?<=ipsum)(?: \w+){2}, but if you want to include punctuation marks use (?<=ipsum)(?: [A-Za-z,.!]+){2}.

function remove2w($anchor, $text, $number = 2) {
    return preg_replace(sprintf('/(?<=%s)(?: \w+){%s}/', $anchor, $number), '', $text);
}

Output:

remove2w('ipsum', 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.')
>>> Lorem ipsum amet, consetetur sadipscing elitr, sed diam.

1

votes

preg_replace() offers quite some flexibility:

<?php
$needle = "ipsum";
$haystack = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam. ";
$pattern = sprintf('|(%s)\s+\w+\s+\w+|', $needle);
var_dump(preg_replace($pattern, '$1', $haystack));

The output obviously is:

string(57) "Lorem ipsum amet, consetetur sadipscing elitr, sed diam. "

1

votes

Another way using explode(). You can split the string by , then array_search() for your $word which will give you the index in the array, then simply unset() the next 2 elements:

<?php
$txt = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.";
$word = "ipsum";
$txtArr = explode(" ", $txt);
$i = array_search($word, $txtArr);
unset($txtArr[$i + 2]);
unset($txtArr[$i + 1]);
var_dump(implode(" ", $txtArr));

Result

Lorem ipsum amet, consetetur sadipscing elitr, sed diam

Demo

Note: you'll need to do some error handling in case the $word is not found

0

votes

There are many considerations to bake into this task.

Do you need word boundaries for your target substring? Without them you may perform unintentional matching; but only you can decide this point for your project.
Do you need case-insensitive matching? I'll guess: yes.
What happens if the target substring is the second last or last word in the string? Do you want to allow one or zero words to be omitted? I'll guess: yes.
You need to consider/include punctuation, right? I'll guess: yes.
Might your target substring contain regex-sensitive characters? If so, preg_quote() is recommended. I'll guess: no, but you can call preg_quote() on the needle before injecting it into the pattern if you are unsure.

Here is a complete battery of needles: (Demo)

$txt = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.';
$needles = str_word_count(strtolower($txt),1);
foreach($needles as $needle){
    echo "($needle) => ",preg_replace('~\b'.$needle.'\b\S*\K(?:\s\S+){0,2}~i','',$txt),"\n";  // use '(($0))'' as replacement to see the substring that is removed
}

Output:

(lorem) => Lorem sit amet, consetetur sadipscing elitr, sed diam.
(ipsum) => Lorem ipsum amet, consetetur sadipscing elitr, sed diam.
(dolor) => Lorem ipsum dolor consetetur sadipscing elitr, sed diam.
(sit) => Lorem ipsum dolor sit sadipscing elitr, sed diam.
(amet) => Lorem ipsum dolor sit amet, elitr, sed diam.
(consetetur) => Lorem ipsum dolor sit amet, consetetur sed diam.
(sadipscing) => Lorem ipsum dolor sit amet, consetetur sadipscing diam.
(elitr) => Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
(sed) => Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed
(diam) => Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam.

Breakdown:

~                #pattern delimiter
\b'.$needle.'\b  #match needle as a whole word
\S*              #match zero or more trailing character as long as first character is a non-word character.  This may be replaced with [[:punct:]]+ if more desirable/accurate
\K               #restart fullstring match
(?:\s\S+){0,2}   #match zero, one or two sequences of: a whitespace character followed by one or more non-whitespace characters
~                #pattern delimiter
i                #case-insensitive pattern modifier

PHP remove next two words after a specific word

4 Answers