7
votes

Example of a $text variable:

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Splitting it in half:

$half = strlen($text) / 2;

will get me to the o character in consequat.

How can I find the position of the nearest sentence delimiter (dot) in the middle of the text? In this example it's 7 characters after that o.

Also this text contains HTML code.
I want to ignore the HTML when finding out the half-point of the text, and ignore dots from within html attributes etc.

3
Clearly you know conceptually what needs to be done. I don't see the problem - what have you tried and why didn't it work?Mahmoud Al-Qudsi
What if the next dot is not the end of the sentence, e.g. an example given?alex
well then I guess it will still be considered end of sentence. I don't know any way around that :)Alex

3 Answers

4
votes

Take a look at substr, strip_tags and strpos. With the help of strpos you find the position of the next dot and with strip_tags you strip all the html tags from the string.

$string = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborumt.';
$string = strip_tags($string);
$half = intval(strlen($string)/2);
echo substr($string, 0, strpos($string, '.', $half)+1);

Consider that you have to make sure a dot exists after the value of $half or else the output is not going to be what you desire.

Perhaps something like this?

if (strpos($string, '.', $half) !== false)
    echo substr($string, 0, strpos($string, '.', $half)+1);
else
    echo substr($string, 0, $half) . '...';
3
votes

Assuming your sentence can end with other characters than period, you could look at this:

$s = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

// find center (rounded down)
$mid = floor(strlen($s) / 2);
// find range of characters from center that are not ?, ! or .
$r = strcspn($s, '.!?', $mid);

// remember to include the punctuation character
echo substr($s, 0, $mid + $r + 1);

You may need to tweak it a little, but it should do it's job well. For more advanced stuff you're treading into NLP (natural language processing) territory, for which there are also libraries in PHP available:

http://sourceforge.net/projects/nlp/

-2
votes
function abbrevia($str, $maxChars) {    $limit=$maxChars;
    if (strlen($str)<=$maxChars) return $str;
    else while ($str[$limit]!=" " && $str[$limit]!="." && $str[$limit]!=";" && $str[$limit]!="," && $str[$limit]!="!" && $str[$limit]) $limit++;
    return substr($str,0,($limit))."...";
}

you can modify this function