I am trying to finish exporting a 1000 article website (ASP SQL Server) with categories and tags into a WordPress blog. The articles were originally written in Microsoft Word and included many non-UTF-8 characters. They were then copy and pasted into Microsoft Access. The articles are currently stored in a SQL Server 2008 database and displayed on a website using the iso-8859-1 charset
I am using the default WordPress import/export xml file (WordPress eXtended RSS (WXR) file) which I copied from the file used when exporting a blog from WordPress. This file requires UTF-8 encoding.
My problem is that iso-8859-1 characters break the importer and many articles are not fully imported. Characters such as these
naïve ,
and funny characters such as “ ’
My question is how do I clean up all the text, I can create a replace function to clean up the funny quotes but there will always be a random word like naïve which will cause a problem?
What is the simplest way to convert the encoding of all the text from iso-8859-1 to UTF-8?