I'm generating an XML file from data held in a MySQL database using PHP using the DomDocument to create the XML structure but struggling with the apostrophe in some of the text. The file I'm trying to replicate from a legacy system encodes the apostrophe to '. When I echo the $dom->savexml() to the screen the results look ok (the apostrophe appears as ') but when using $dom->save to save the text to file, the apostrophe appears as ' i.e. it appears to be double escaping the text and encoding the ampersand.
I've been scouring many threads on this over the last few days to see if there is anything I've missed and my last round of testing has been based on the previous article here: PHP How to use quot; entities in XML with DOMdocument which was started nearly 4.5 years ago.
I've also tried different methods including using htmlspecialchars and htmlentities using various combinations of the flags and setting double encode to false.
Using htmlspecial characters, I'm following the advice in the PHP manual that single quotes are only translated where both ENT_QUOTES is set and ENT_XML1, ENT_XHTML or ENT_HTML5. I've tried all three of those.
Moving onto code examples to help illustrate the problem...
This is mostly taken from Jack's accepted answer to the question in the thread linked above with the addition with the addition of the htmlspecialchars function wrapped around the content for the text node.
$dom1 = new DOMDocument;
$e = $dom1->createElement('description');
$content = 'single quote: \', double quote: ", opening tag: <, ampersand: &, closing tag: this has changed 02 >';
$t = $dom1->createTextNode(htmlspecialchars($content, ENT_XML1 | ENT_QUOTES,'utf-8',false));
$e->appendChild($t);
$dom1->appendChild($e);
echo '#results: '.$dom1->savexml();
$test1 = $dom1->savexml();
$dom1->save("./exports/"."testing_dom.xml");
Echoing the results to screen gives the output I'm looking for, i.e. in the addition to the ampersand, less than and greater than characters being encoded to & < and > respectively, the double quote and single quote are encoded as " and ' which is what I'm looking for.
#results: single quote: ', double quote: ", opening tag: <, ampersand: &, closing tag: this has changed 02 >
The last line of the code above saves the results to a testing_dom.xml file, the contents of which appear as follows:
<?xml version="1.0"?>
<description>single quote: &apos;, double quote: &quot;, opening tag: &lt;, ampersand: &amp;, closing tag: this has changed 02 &gt;</description>
Here all of the characters seem to have the preceding ampersand of the entity double escaped i.e. ' becomes &apos;
Is there something I'm missing here with saving the file?
CDATA
section? – Professor Abronsius'
in your data actually is that character. You need to applyhtmlspecialchars
the moment you are making this debug output, for the browser to display what your data actually contains - and then you will see, that that is&apos;
, same as you see when you check what actually got written to the file. – misorudehtmlspecialchars
tocreateTextNode
makes rather little sense to begin with - that method is named that way for a reason, its purpose is to create a text node, with what you passed as the argument becomming that text node’s actual text content. If you pass the characters&
,a
,p
,o
,s
and;
in sequence, then those characters will be the text you get as result. – misorude