0
votes

I'm trying to parse an xml-formatted string into a DOMDocument. The following is my code:

mysql_connect("localhost", "MYUSERNAME", "MYPASSWORD") or die(mysql_error());
mysql_select_db("cmj_db") or die(mysql_error());

$data = mysql_query("SELECT article_id, html_data from articles where article_id=".$_GET["article_id"]) or die(mysql_error());
$dataArray = mysql_fetch_array($data);
echo 'article: ' . $dataArray['article_id'] . '<br />';

$doc = new DOMDocument;
$doc->loadXML(Encoding::toUTF8($dataArray['html_data']));

I get the error: Warning: DOMDocument::loadXML(): Input is not proper UTF-8, indicate encoding ! Bytes: 0x96 0x20 0x6E 0x6F

There are special characters involved, so I require a UTF encoding. When I echo the string on its own, the characters look fine. It might be helpful to note that this has been a long succession of conversions.. I unescaped a lot of characters from an html encoding, and then imported this to a mysql table (with utf-9 charset). How can I convert this string to unicode so I can parse it as XML?

Thanks

1
You need to know if the data from the table is UTF-8, and what the encoding of the connection of mysql_connect() is. If it is already in UTF-8, you might transcode it one time too many via Encoding::toUTF8 - Code4R7

1 Answers

0
votes

Have you tried mb_convert_encoding()? mb_convert_encoding()

If I understood you correctly, encoding of your XML is UTF-9, and you need UTF-8?

mb_convert_encoding($dataArray['html_data'], 'UTF-8', 'UTF-9')