So for example, an ISO-8859-1 encoded XML document that even has some characters that are not part of the character set of that encoding, let's say the € (euro) symbol. This is possible in XML if the symbol is represented as a unicode character entity, in this case the €
(euro) string:
<?xml version="1.0" encoding="ISO-8859-1"?>
<foo>
<bar>€</bar>
</foo>
I need to obtain the bar element string with the same encoding as the document, which means encoded in ISO-8859-1 (also means to preserve the unicode character entities that are not part of this encoding), i.e. the ISO-8859-1 string <bar>€</bar>
.
I couldn't achieve this by using the saveXML method of the DOMDocument class, since it dumps elements always in UTF-8 (whilst whole documents always in the encoding of their XML declaration):
$DD = new DOMDocument;
$DD -> load('foo.xml');
$dump = $DD -> saveXML($DD -> getElementsByTagName('bar') -> item(0));
The $dump
variable resulted in the UTF-8 string <bar>€</bar>
.
Notice how elements are dumped also with its unicode character entities traduced to actual UTF-8 characters.
So, how would I get the ISO-8859-1 string <bar>€</bar>
? Are XML parsers meant to work this sort of task or should I just utilize regular expressions o something else?