1
votes

My database is in latin1 and collation is latin1_swedish_ci. Also in my php file I'm using iso-8859-1. But If I store some characters like 'é' and some others I'm having problem in dowloading the content. So we have added the default character-set to utf8 and skip-character-set-client-handshake in my.cnf file. Also added 'set-name utf8' in query soon after every connection establishment and before executing any query. This solved the issue. But I have done some observations as below

Case 1:
Mysql conf file: No default character-set and no skip-character-set-client-handshake
Query: No set names
Result: 
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name            | Value                      |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client     | latin1                     |<br/>
| character_set_connection | latin1                     |<br/>
| character_set_database   | latin1                     |<br/>
| character_set_filesystem | binary                     |<br/>
| character_set_results    | latin1                     |<br/>
| character_set_server     | latin1                     |<br/>
| character_set_system     | utf8                       |<br/>
| character_sets_dir       | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>

stored ->é(utf8)(Hex->C3A9)<br/>


Case2:
Mysql conf file: Default character-set - utf8 and no skip-character-set-client-handshake
Query: No set names<br/>
Result: 
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name            | Value                      |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client     | latin1                     |<br/>
| character_set_connection | latin1                     |<br/>
| character_set_database   | utf8                       |<br/>
| character_set_filesystem | binary                     |<br/>
| character_set_results    | latin1                     |<br/>
| character_set_server     | utf8                       |<br/>
| character_set_system     | utf8                       |<br/>
| character_sets_dir       | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
<br/>
stored ->é(utf8)(Hex->C3A9)<br/>


Case3:
Mysql conf file: Default character-set - utf8 and skip-character-set-client-handshake
Query: No set names<br/>
Result: 
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name            | Value                      |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client     | utf8                       |<br/>
| character_set_connection | utf8                       |<br/>
| character_set_database   | utf8                       |<br/>
| character_set_filesystem | binary                     |<br/>
| character_set_results    | utf8                       |<br/>
| character_set_server     | utf8                       |<br/>
| character_set_system     | utf8                       |<br/>
| character_sets_dir       | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>

stored ->é(latin1)(Hex->E9)<br/>


Case4:
Mysql conf file: no Default characterset - utf8 and skip-character-set-client-handshake
Query: No set names<br/>
Result: 
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name            | Value                      |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client     | utf8                       |<br/>
| character_set_connection | utf8                       |<br/>
| character_set_database   | utf8                       |<br/>
| character_set_filesystem | binary                     |<br/>
| character_set_results    | utf8                       |<br/>
| character_set_server     | utf8                       |<br/>
| character_set_system     | utf8                       |<br/>
| character_sets_dir       | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>

stored ->é(utf8)(Hex->C3A9)<br/>


Case5:
Mysql conf file: Default characterset - utf8 and skip-character-set-client-handshake
Query: set names utf8<br/>
Result: 
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name            | Value                      |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client     | utf8                       |<br/>
| character_set_connection | utf8                       |<br/>
| character_set_database   | utf8                       |<br/>
| character_set_filesystem | binary                     |<br/>
| character_set_results    | utf8                       |<br/>
| character_set_server     | utf8                       |<br/>
| character_set_system     | utf8                       |<br/>
| character_sets_dir       | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>

stored ->é(latin1)(Hex->E9)<br/>


Case6:
Mysql conf file: Default characterset - utf8 and no skip-character-set-client-handshake
Query: set names utf8<br/>
Result: 
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name            | Value                      |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client     | latin1                     |<br/>
| character_set_connection | latin1                     |<br/>
| character_set_database   | utf8                       |<br/>
| character_set_filesystem | binary                     |<br/>
| character_set_results    | latin1                     |<br/>
| character_set_server     | utf8                       |<br/>
| character_set_system     | utf8                       |<br/>
| character_sets_dir       | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>

stored ->é(latin1)(Hex->E9)<br/>


Case7:
Mysql conf file: no Default characterset  and no skip-character-set-client-handshake
Query: set names utf8<br/>
Result: 
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name            | Value                      |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client     | latin1                     |<br/>
| character_set_connection | latin1                     |<br/>
| character_set_database   | latin1                     |<br/>
| character_set_filesystem | binary                     |<br/>
| character_set_results    | latin1                     |<br/>
| character_set_server     | latin1                     |<br/>
| character_set_system     | utf8                       |<br/>
| character_sets_dir       | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>

stored ->é(latin1)(Hex->E9)<br/>


Case8:
Mysql conf file: no Default characterset  and  skip-character-set-client-handshake
Query: set names utf8<br/>
Result: 
mysql> show variables like '%charac%';<br/>
+--------------------------+----------------------------+<br/>
| Variable_name            | Value                      |<br/>
+--------------------------+----------------------------+<br/>
| character_set_client     | latin1                     |<br/>
| character_set_connection | latin1                     |<br/>
| character_set_database   | latin1                     |<br/>
| character_set_filesystem | binary                     |<br/>
| character_set_results    | latin1                     |<br/>
| character_set_server     | latin1                     |<br/>
| character_set_system     | utf8                       |<br/>
| character_sets_dir       | /usr/share/mysql/charsets/ |<br/>
+--------------------------+----------------------------+<br/>
8 rows in set (0.00 sec)<br/>

stored ->é(latin1)(Hex->E9)<br/>



Output containing all the 8 cases together<br/>
+-----------+------------------------------------------------------------------+<br/>
| HEX(name) | desc                                                             |<br/>
+-----------+------------------------------------------------------------------+<br/>
| C3A9      | no skip handshake and no default  in conf and nothing in query   |<br/>
| C3A9      | no skip handshake and default utf8 in conf and nothing in query  |<br/>
| E9        | skip handshake and default utf8 in conf and nothing in query     |<br/>
| C3A9      |  skip handshake and no default  in conf and nothing in query     |<br/>
| E9        | skip handshake and default utf8 in conf and utf8 in query        |<br/>
| E9        | no skip handshake and default utf8 in conf and utf8 in query     |<br/>
| E9        | no skip handshake and no default  in conf and utf8 in query      |<br/>
| E9        |  skip handshake and no default  in conf and utf8 in query        |<br/>
+-----------+------------------------------------------------------------------+<br/>

On what basis the data is being stored in database? Sometimes its stored in latin1 format and sometimes in utf8. Is it based on options(I mean variables like character_set_client, character_set_server etc...,) or On my.cnf configuration??

Considering all the 8 cases I'm not getting a conclusion on this. I have also gone through the explaination on 'SET Names', 'skip-character-set-client-handshake'. But still in confusion. Is there any conversion happens while storing or just while displaying??

The above observations are done using a php script containing below lines.

$conn = mysqli_connect('<host>', '<username>', '<password>', 'table');<br/>
mysqli_query($conn, "SET NAMES 'utf8';");<br/>
mysqli_query($conn, 'insert into router.test values ("é");');<br/>

Thanks in advance for the replies.

3
How to create a MINIMAL, Complete, and Verifiable example might be helpful to improve your question.Geshode
Okay. I just want to know based on what character set the data is being stored in database/tables. <br/><br/> Also I want to know the answer for same question based on the below condition<br/> 1. adding default charset as utf8 to my.cnf file<br> 2. adding skip-character-set-client-handshake in my.cnf file<br> 3. adding 'set-names utf8' in query soon after every connection establishment and before executing any query.<br/>Shreelakshmi G
General tip: always set encoding explicitly. System-wide defaults are a nice commodity but should never be only defence against data corruption. Plus mysqli has mysqli_set_charset().Álvaro González
Please look at the definition of your router table. SHOW CREATE TABLE router is likely to get you want you want.O. Jones
You keep replying to requests for more information by making the incorrect assertion that the requested information is irrelevant. It isn't irrelevant.O. Jones

3 Answers

2
votes

My database is in latin1 and collation is latin1_swedish_ci.

Those are default settings. After you've created columns and tables, each column has a character set and a collation. Changing the defaults does not change existing column and table definitions.

in my php file I'm using iso-8859-1.

That is ok. Latin1 === iso-8859-1

But If I store some characters like 'é' and some others I'm having problem in dowloading the content.

I guess you mean the two-step process of retrieving the content from your table and then sending it to a web browser for rendering. It's the second step that might be wrong. Try setting this in the head section in your php / html file.

<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">

If you set that and everything renders correctly, you're done. You can also set it to unicode and see what happens. If you don't set it, browsers try to guess, using arcane rules.

So we have added the default character-set to utf8 and skip-character-set-client-handshake in my.cnf file.

Again, that doesn't change your existing columns in your tables.

On what basis the data is being stored in database?

Again, according to the character set chosen for each column.

Sometimes its stored in latin1 format and sometimes in utf8.

It's true.

Best practice: mention the character set and collation for each table as you create it. If you have exceptions for some columns, mention them for the columns as you create them

Best practice: for new databases use utfmb4.

Best practice: always set your database connection character set.

Best practice: read about how to create and manipulate unicode strings in php.

Best practice: when standing up a new MySQL server, set the server-wide defaults to utfmb4 and utfmb4_general_ci.

Unfortunately, migrating an existing php application to unicode can be a pain in the neck.

1
votes

Your client sends either C3A9 or E9 -- independent of anything that MySQL says.

my.cnf and/or SET NAMES and/or the connection parameters determine the 3 of the values in SHOW VARIABLES LIKE 'character_set_%'.

Those settings say whether to interpret C3A9 or E9 as latin1 or utf8:

C3A9 interpreted as utf8: good (é) C3A9 interpreted as latin1: Mojibake (é) E9 interpreted as latin1: good (é) E8 interpreted as utf8: String is truncated due to non-utf8 byte

But we are not finished...

If you are INSERTing, then the Server looks at the encoding of the target column and converts from the above (latin1 or utf8) to the column's declaration. If same, no conversion needed, if different, conversion occurs during store. I'm surprised you didn't stumble on "double encoding" wherein é is stored. This happens when utf8 bytes (C3A9) are mis-declared to be latin1 via SET NAMES (etc), then stored into a utf8 column (hence another conversion).

More discussion: Trouble with UTF-8 characters; what I see is not what I stored and http://mysql.rjweb.org/doc.php/charcoll

0
votes

Thanks for the response and links @Rick James.

Actually I got the answer for my question from some other link and also from above shared link by Rick James. This satisfies all the above 8 cases.

If there is a discrepancy between the connection(utf8) and the storage character-set(latin1), MySQL converts the content from one encoding to another.

Case 1: Here we are actually writing UTF-8 characters to the latin1 database, each of the UTF-8 byte sequences will be interpreted as separate latin1 characters. Each byte composing an UTF-8 character(é) is interpreted as a separate latin1 characters and each character is converted to UTF-8 when writing to the table(à © -> Ã © -> c3 A9)

i.e., the string inserted is é, in UTF-8 (é) the character is represented as two bytes, hexadecimally denoted as C3A9. As we insert UTF-8 data into this table it just treats the two bytes sent as two latin1 characters, does no conversion, and saves them as such into the table. Before doing SET-NAMES to utf8, its inserting in the format C3A9

mysql> insert into test.test values ('é');

Query OK, 1 row affected (0.00 sec)

mysql> select hex(text) from test;

+-----------+
| hex(text) |
+-----------+
| C3A9 |
+-----------+
1 row in set (0.00 sec)

Case 2: When we do SET-NAMES to utf8 whatever data we try to insert to table it will consider it as utf8, as the targeted column(text) is of type latin1, it will convert the utf8 data(2bytes -> C3E9) to latin1 (1byte -> E9).

mysql> set names 'utf8';
Query OK, 0 rows affected (0.00 sec)

mysql> insert into test.test values ('é');
Query OK, 1 row affected (0.00 sec)

mysql> select hex(text) from test;
+-----------+
| hex(text) |
+-----------+
| C3A9 |
| E9 |
+-----------+
2 rows in set (0.00 sec)

So after doing SET-NAMES to utf8 whatever data sent from client especially characters like é are converted and stored in latin1 format itself and not treated as 2 separate latin1 characters.

Also while fetching the data it will convert back to its original format.

SET NAMES indicates what character-set the client will use to send SQL statements to the server. Thus, SET NAMES 'utf8' tells the server, “future incoming messages from this client are in character-set utf8.” It also specifies the character-set that the server should use for sending results back to the client.

A SET NAMES 'charset_name' statement is equivalent to these three statements:

SET character_set_client = charset_name;
SET character_set_results = charset_name;
SET character_set_connection = charset_name;