7
votes

In order for website to accept user submitted content which may not be in English (e.g. Japanese) and save it to the database, is it in my best interest to utf8_encode all new content, and user utf8_decode when retrieving it later?

Further info: I am using doctrine and I am getting an errors when attempting to save or select Unicode characters to the MySQL database:

SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='

2
Please include your table schema into this. And possible to provide the FULL sql? <--Hint (comparison issue)ajreal
Needs more info: What collation(s) are your tables using?Pekka
possible duplicate of UTF-8 all the way throughmercator

2 Answers

8
votes

You don't need to use the encode function. What you need to do is make sure you are UTF8 end to end. Looks like you database might be using latin1 encoding and collation. Your connection to the database also needs to be UTF8. Sometimes that simply a matter of executing SET NAMES UTF8 query right after you establish a connection.

Running this command in mysql will likely resolve the error you see above, but you still need to be end-to-end UTF8. Then you don't need to do anything special with your data.

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
5
votes

Brent is right. It needs to be end-to-end. Here's my list:

Apache config:
    AddDefaultCharset UTF-8
    AddCharset UTF-8  .utf8

php.ini:
    default_charset = "utf-8"

MySQL:
    ALTER DATABASE DEFAULT CHARACTER SET utf8;
    ALTER TABLE SomeTableName DEFAULT CHARACTER SET utf8;

PHP/HTML:
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    …
    <form … <input type="text" name="some_field" value="<?php echo htmlspecialchars($row['some_field'], ENT_COMPAT, 'UTF-8'); ?>"…

This last one seems the most important. Call this function immediately after the mysql_connect() call:
    mysql_query("SET NAMES 'utf8'");