2
votes

I need to replace some sentences from a bunch of documents. All sentences are nearly the same, but in some documents are breaks, missing or added words / characters. I tried to match the first and the last words, but thats not accurate.

Is there any way or does anyone have an idea how to replace sentences that just nearly matches?

Example Lets say i want to replace the following sentence.

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt

Here is the sentence with a break

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, 
sed diam nonumy eirmod tempor invidunt

A missing comma

Lorem ipsum dolor sit amet, consetetur sadipscing elitr
sed diam nonumy eirmod tempor invidunt

And missing words

Lorem ipsum dolor sit amet sadipscing elitr, sed diam nonumy invidunt
1
It's possible, but your regexp might end up being thiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiis long. I'm unsure about native support. - David Harris
You would first have to define what words/characters your sentence must contain. Once that is defined, then put .* or some stronger constraint (i.e. (,|\.|\n|\t){0,3}, no more than 3 of either , or . or newline or tab) between the must-haves. - fo_x86
I would start off trying to normalize your strings. ie. you could say that line-endings without a comma should have it, or some words are irrelevant to your string so you just remove them. There HAS to be some logic to your strings or else it will be very difficult to replace them. - Bjørne Malmanger

1 Answers

1
votes

Un-tested, but check out similar_text();

    <?php
$threshold = 80; //Percentage threshold
    $par1 = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt";
    $par2 = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, 
    sed diam nonumy eirmod tempor invidunt";

    $percent = similar_text($par1,$par2);
    if ($percent < $threshold) {
        //Correct the incorrect paragraph
        $par2 = $par1;
    }
    ?>