I'm trying to extract the structure of an XML document in PHP without expanding the entities within. I'm aware that entities are usually expanded before the structure is parsed, and that ignoring this means that the XML may not be well-formed, but I'm parsing XML fragments which might not include the normal XML document header, and so will be missing the entity declarations.
Ideally I'd like a callback when an undeclared entity is found so that I can handle it myself. XMLReader and xml_parser both seem to have no way to turn off the errors these produce during parsing.
Is there any easy way to do this, or will I need to fall back on my own parser (which wouldn't be a disaster - I only need to parse a few tags, and then keep all the text inside them).
Here's an example of some DocBook from the first chapter of the official DocBook guide:
<chapter id="ch-gssgml">
<?dbhtml filename="ch01.html"?>
<chapterinfo>
<pubdate>$Date$</pubdate>
<releaseinfo>$Revision$</releaseinfo>
</chapterinfo>
<title>Getting Started<?lb?>with &SGML;/&XML;</title>
<para>
...
</para>
</chapter>
Trying to parse this just dies as soon as it sees the &SGML; entity.