I have a markup HTML as below:
<body>
<div>......</div>
............
<div class="entry-content">
<div class="code1 code2">(ads.....);</div>
<p><img src="https://www..."></img></p>
<h2> title </h2>
<div class="code1-block code2">(ads.....);</div>
<div class="data1 dta-ta1">
<ul><li><p> text</p></li>
<li><span> text2 </span></li>
<li><span> text3 </span></li>
<div class="codex1 code-block"><span>(ads ....); </span></div>
<li><span> text4 </span></li>
<div class="codex1 code-block"><span>(ads ....); </span></div>
</ul>
</div>
<div class="codex2-block code2">(ads.....);</div>
<div class="data2-entry dta-ta2">
<p>
<span> text5</span>
</p>
<p> text6 </p>
<p> text7 </p
<div class="codex1 code-block"><span>(ads ....); </span></div>
<li><span> text8 </span></li>
<div class="codex1 code-block"><span>(ads ....); </span></div>
</div>
</div>
</body>
I've tried to "go into div with class="entry-content"
get all texts from its child nodes excluding child nodes with class= "code1", "code2", "codex1", "codex2"
My code as below just goes to the div and gets all texts from child nodes. However, I can not remove text from the child nodes with code1 & code2. I appreciate for your supports. Thanks.
$classname='entry-content';
$a = new DOMXPath($dom);
$query = "//*[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]";
$list = $a->query($query);
if ($list->length > 0) {
foreach ($list as $element) {
$nodes = $element->childNodes;
foreach ($element as $node) {
$bodytext = trim(preg_replace('/[\r\n]+/', ' ', $node->nodeValue));
$bodyContent .= '<p>' . $bodytext . '</p>';
}
}
}
My expected output:
https://www...
title
text2
text3
text4
text5
text6
text7
text8