0
votes

I'm trying to "decode" an XML file (and transforming it with XSLT), but I'm having trouble decoding both files. The scenario is as follows:

I have a site for data entry which is all encoded in ISO-8859-1 (our Oracle database is in that format, so I can't change it). The problem is, I have those 2 files (an XML to show the data entry form and and XSLT to transform it into HTML). Both files are saved in ISO-8859-1 encoding, and both have the corresponding header, i. e., , and whenever I read the files and show them in the browser, the special characters (ñ, á, ¿) are shown either as UTF-8 or as a question mark (depending on the method I use for showing), but never as the "normal" representation.

My code for showing the XML file is:

<?php
$xslString = file_get_contents("catalog.xsl");
$xslString = utf8_decode($xslString);
$xslDoc = simplexml_load_string($xslString);

$xmlString = file_get_contents("questionnaire.xml");
$xmlString = utf8_decode($xmlString);
$xmlDoc = simplexml_load_string($xmlString);

$proc = new XSLTProcessor();
$proc->importStylesheet($xslDoc);
?>

I already tried several combinations of DOMDocument, iconv, mb_convert_encoding, but they show the XML file as unencoded UTF, a question mark or a double question mark.

On the other hand, this also messes up my data entry, since if I want to enter one of those characters, they either show as ? or ?? on the corresponding data field on the DB, or they get truncated at the first special char (if I use iconv).

What am I missing? Is there a workaround? I can't convert anything to UTF-8 because of the database.

I hope I'm being clear enough, please excuse my English.

Thanks in advance!

1
Can you show us the XML declaration of the XML/XSL files. It should contain the encoding of the files. You need to send the browser the encoding you're generating with the XSLT. Calling utf8_decode on an XML, will not change the the declaration, here is a good chance that it will break the XML. I don't think importStylesheet() can load an SimpleXMLElement. ... As a general rule you want to use UTF-8 anywhere in your application and convert any content to this encoding, set all connection to it, ... - ThW
@ThW, sure. Do you mean the "headers" on the XML/XSL? If so, for the XSL file: <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="w3.org/1999/XSL/Transform"> <xsl:output method="html"/> And for the XML: <?xml version="1.0" encoding="ISO-8859-1"?> <CUESTIONARIO> <NOMBRE_CUESTIONARIO>Invex</NOMBRE_CUESTIONARIO> <SECCION> <QUESTIONID>1</QUESTIONID><?xml version="1.0" encoding="ISO-8859-1"?> <CUESTIONARIO> <NOMBRE_CUESTIONARIO>Invex</NOMBRE_CUESTIONARIO> BTW, the XML/XSL combo is shown correctly, except for the special chars. - Arturo Barajas
Edit: added the option for encoding in <xsl:output method="html"> so it reads now <xsl:output method="html" **encoding="ISO-8859-1"**>, but it didn't work either. - Arturo Barajas
Try to open the XML/XSL directly in the browser. Check if the browser can display the special chars if the file is opened directly. The declaration says it is ISO-8859-1, this encoding shares the first 127 code points with UTF-8. All others are encoded in multiple bytes (2-4). Converting ISO-8859-1 as UTF-8 to ISO-8859-1 again will destroy the special chars. DOMDocument uses UTF-8 internally. If you read a value from the DOM it will be UTF-8. But it will convert it back to ISO-8859-1 if you save the XML/HTML. Do you send the browser a content type header? - ThW
The XML shows correctly, since it was saved with ISO88591 encoding and the headers are sent correctly, both on the XSL and the XML side. I think my issue is more related to what you say that DOMDocument uses UTF8. Since the PHP page is in ISO88591 and the XML file is read using UTF8, seems that the latter is messing up my whole setup. Just to be clear, the PHP script loads and shows an entry form from the XML/XSL combination. I don't know if I understand correctly your last question. Isn't it that the XSL sends the header in the xsl:output? Should I send it some other way? Thanks! - Arturo Barajas

1 Answers

0
votes

Hope this helps others. In the end, there were two things:

1) I was reading the XML/XSL files like this (in my original script):

<?php
$xmlDoc = new DOMDocument();
$xmlDoc->loadXML($xmlFile);
$xmlDoc->load("xmlfile.xml");
?>

which effectively changed the encoding to UTF-8. I changed the lines to:

<?php
$xmlString = file_get_contents("xmlfile.xml");
$xmlDoc = simplexml_load_string($xmlString);
?>

removing the utf_decode statement, and it worked like a charm. Now I get my special chars on screen as they're intended. As a side effect, the data entered in the form is now saved correctly to my database, so I got two birds in one shot.