0
votes

I am importing some feeds from various sources into WP and scrape some data. Sadly the HTML code doesn't seem to be that clean, leaving open tags and they mess up the layout. I want to close all open tags from the content.

I read that i could use DOMDocument for that, so i created below:

$yourText = apply_filters( 'the_content', get_the_content() );

$doc = new DOMDocument();
$doc->loadHTML($yourText, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$yourText = $doc->saveHTML();
echo $yourText;

I believe that the actual content is not properly passed/parsed, however as i am not a coder, i don't know what i have done wrong, some pointers would be welcome.

1

1 Answers

0
votes

Hello maybe your problem could be solved by the following code

$yourText = apply_filters( 'the_content', get_the_content() );

$doc = new DOMDocument();
$doc->loadHTML($yourText, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

// Strip wrapping <html> and <body> tags
$mock = new DOMDocument;
$body = $doc->getElementsByTagName('div')->item(0);

foreach ($doc->documentElement->childNodes as $child) {
   $mock->appendChild($mock->importNode($child, true));
}

$yourText = "<" . $doc->documentElement->tagName . ">" .trim($mock->saveHTML()) . "</" . $doc->documentElement->tagName . ">";
echo $yourText;

I got a part of the solution from here