0
votes

I run a personal website that always showed accented chars correctly. Now, suddenly, it doesn't any longer. The funny thing is, even its localhost version doesn't.

The application is unaltered over years in this regard and here it is what it does, in the given order:

  1. mysql database set to collation utf8_general_ci

  2. Application sends these two queries: "SET NAMES 'utf8' COLLATE 'utf8'" and "SET CHARACTER_SET 'utf8'"

  3. Php headers send the following headers before anything is printed: header('Content-type: text/xml; charset=utf-8'."\r\n"); header('Content-transfer-encoding: utf-8'."\r\n");

  4. Each web page shows a meta tag as follows: <meta http-equiv="content-type" content="text/html; charset=utf-8" />

Yet, now, suddenly, chars are shown all wrong. If I replace manually the chars, they are shwon as intended. But I cannot fathom if or what may have "corrupted" the database then. And certainly I cannot fix manually hundreds of posts.

Any idea why this strange thing suddenly happens and suggestions about how to fix it?

Instance of a wrong line: "Non ho mai avuto l' opportunità di incontrarti di persona. Non so se è perchè non ho cercato abbastanza l' occasione o perchè" etc...

1
So the App did not change! But did something else change? Apache/MYSQL or anything else you can think ofRiggsFolly
use utf8_unicode_ci instead of general (stackoverflow.com/questions/766809/…)Snowman
on my part absolutely nothing changed. The codes are the same because they worked correctly ove the years. Also the remote database appears still set to utf8_general_ci. If the hosting server updated something, I don't know. But it's register.com and I doubt there they would make updates that can affect charsets, they serve a pool of clients from way too many countries to afford an error in charset handling.alberto
Have you tried a different browser? You can force any browser to interpret all pages as Latin-1 or whatnot, which could cause the phenomenon.YetiCGN
@Snowman - those two collations only differ in multi-character (not multi-byte) utf8 sequences.Rick James

1 Answers

1
votes

è is Mojibake for è. Were you expecting a grave-e? Regardless of what changed or did not change, let's look at fixing it.

See this and look for Mojibake. It says to check/fix these:

  • The bytes to be stored need to be UTF-8-encoded. Fix this.
  • The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4. Fix this.
  • The column needs to be declared CHARACTER SET utf8 (or utf8mb4). Fix this.
  • HTML should start with .

Also see the technique for checking the HEX of what is stored for è:

  • utf8 hex is C3A8.
  • Hex C383C2A8 means you have "double encoding"; that will lead to other issues.
  • E8 is the latin1 hex -- I doubt if you will see this.