I recently stumbled upon a MySQL database that was encoded using Latin1 and was rendering when viewed on a browser question mark symbols. To fix this we changed the encoding of the DB to utf8 and the Collation to utf8_general_ci on all of our tables, but the data already stored was still showing up with question mark symbols, all of the storing and polling of data from mysql to the browser was done by php i made sure utf8 was used on php as well and even ran set names utf8 as many people suggested online, the problem is that now I ended up with weird characters such as Ñ on strings we knew didn't had them.
Examples of data
Stored:
EMMANUEL PE\xc3\u0192\xc2\u2018A GOMEZ PORTUGAL
Rendered:
EMMANUEL PEÑA GOMEZ PORTUGAL
Proper:
EMMANUEL PEÑA GOMEZ PORTUGAL
Stored:
Luis Hern\xe1ndez-Higareda
Rendered:
Luis Hernández-Higareda
Proper:
Luis Hernández-Higareda
Stored:
Teresa de Jes\xc3\u0192\xc2\xbas Galicia G\xc3\u0192\xc2\xb3mez
Rendered:
Teresa de Jesús Galicia Gómez
Proper:
Teresa de Jesús Galicia Gómez
Stored:
DR. JOS\xc3\u0192\xc2\u2030 ABEN\xc3\u0192\xc2\x81MAR RIC\xc3\u0192\xc2\x81RDEZ GARC\xc3\u0192\xc2\x8dA
Proper:
DR. JOSÉ ABENÃÂMAR RICÃÂRDEZ GARCÃÂA
Currently I'm using python to get the data from the DB, I'm trying to normalize to unicode utf8 but I'm really lost, thats as far as I'm getting here, I need to convert what currently shows up as weird characters to readable text as shown above.
what am I missing here? is the data on unrepairable?
Functions https://gist.github.com/2649463
Note: of all of the examples there's 1 that is properly rendering (left there so consideration is taken if any advice is given on how to fix this )