0
votes

I'm having some difficulty understanding differences in logic between my.cnf settings and actual database settings.

For example, my my.cnf has the following:

init_connect                    = 'SET collation_connection=utf8_unicode_ci'
init_connect                    = 'SET NAMES utf8'
character-set-server            = utf8
collation-server                = utf8_unicode_ci

And when I log into the database without specify the schema I see the following collation variables set:

mysql> show variables like '%coll%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_unicode_ci |
| collation_server     | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

I have a few questions here:

1) Why is collation_connection set to utf8_general_ci when I specify utf8_unicode_ci in my my.cnf file?

2) Why, when I create tables is the default collation utf8_general_ci?

I'm having some difficulty understanding how these collation variables are interpreted and used.

2
Keep in mind that root skips init_connect during its login. - Rick James

2 Answers

2
votes

Several character set and collation system variables relate to a client's interaction with the server.

The server uses the character_set_connection and collation_connection system variables, then it converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). The collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.

Regarding your second question - the server character set and collation are used as default values if the database character set and collation are not specified in CREATE DATABASE statements. They have no other purpose. You have to make sure that your database character set is set at the very creation of it (or, you can change it later of course with ALTER DATABASE database_name CHARACTER SET utf8 COLLATE utf8_unicode_ci;)

0
votes

Another way to approach charsets... (This bypasses the question posed and provides a 'best practice'.)

  • When connecting to the database, use the connect parameter(s) of the client to announce the encoding for bytes in the client. Alternatively, execute SET NAMES after connecting.
  • Be sure that each column has the desired CHARACTER SET and, optionally, the desired COLLATION. This establishes the encoding stored in the columns. It is usually sufficient to specify these on the CREATE TABLE and let the columns inherit it. (The table defaults to the database, which defaults to character_set_server, but it confusing, maybe unwise, to depend on this tree of defaults.)

SET NAMES sets exactly these three 'variables':

character_set_client
character_set_connection
character_set_results

Think of them as saying what encoding exists in the client, regardless of what CHARACTER SET exists on each column. Conversion will occur, as needed, on INSERT and SELECT.

(This answer should not be in conflict with @bodi0's.)