I have a PHP script that imports and parses XML files and saves the data into the database:
- Database collation:
utf8_general_ci, charset:utf8 - Page's charset :
utf-8 - XML files:
ANSI, contains smart quotes (from MS Word)
So during import I do a utf8_encode() on the text from the XML files prior to saving into the database and subsequently displaying on the page.
But when successfully imported, and saved into DB,
- Database: smart quotes are saved as
?character (viewed from CMD) - Page: smart quotes are displayed as boxes
Any ideas as to why the smart quotes are not being converted correctly, even when using utf8_encode()?
EDIT:
@Tomalak: The XML files are actually .txt, no XML declaration (<?xml ... ?>), and no root element. My script actually adds a root element just so the parser works:
utf8_encode('<article>' . file_get_contents($xmlfile) . '</article>');
Seems like I need to add an XML declaration..? If so, how should it look like?
<?xml ... ?>) of your XML files, along with the character code (use a hex editor) the smart quotes have there? - Tomalak<?xml encoding="Windows-1252"?><article> . file_get_contents($xmlfile) . </article>and remove theutf8_encode()part. Then parse the resulting string withDOMDocument. Just make sure that theencodingdeclaration matches the bytes in the text file. (At least I suppose it should work this way.) - Tomalak