1
votes

I use a Python program to write text containing Unicode characters to a MySQL database. As an example, two of the characters are

u'\u2640' a symbol for Venus or female
u'\u2642' a symbol for Mars or male

I use utf8mb4 for virtually all character sets involved with MySQL. Here is an excerpt from /etc/mysql/my.cnf

[client]
default-character-set=utf8mb4

[mysql]
default-character-set=utf8mb4

[mysqld]
default-character-set=utf8mb4
character-set-server =utf8mb4
character_set_system =utf8mb4

In addition, all tables are created with these parameters:

ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

In all respects except one, the treatment of Unicode works just fine. I can write Unicode to database tables, read it, display it, etc., with no problems. The exception is mysql, the MySQL Command-Line Tool. When I execute a SELECT statement to see rows in a table containing the Venus and Mars Unicode characters, here is what I see on the screen:

| Venus     | ♀      |
| Mars      | ♂      |

What I should see in the right column are the standard glyphs for Venus and Mars.

Any ideas about how to get the MySQL Command-Line Tool to display Unicode properly?

Edit:

I have done a fair amount of research into the various MySQL system variables, etc., and I now realize that the my.cnf settings shown above have some serious issues. In fact, the server, mysqld, would not launch with the settings shown. To correct things, remove these from [mysqld]:

default-character-set=utf8mb4
character-set-system=utf8mb4

I'm not sure that the [client] option does anything, but it doesn't seem to hurt.

In Python u'\u2640' represents a single Unicode character, namely "♀". This compiles down to three bytes containing the hex value E29980. I am having no problems at all encoding and decoding Unicode. The correct values are being stored in a MySQL table; they are correctly read from the table, and when displayed by a Python program they show up like this:

♀   Venus
♂   Mars

The program output can be redirected to a file, processed by a text editor, etc., and in all cases the correct Unicode symbol is displayed.

There is only one place where the correct Unicode symbol is not displayed, and that is when I am using the MySQL Command Line Tool. When I issue a SELECT statement on the table containing the Unicode symbols I get the junk shown above. This is not a Windows specific issue. I have exactly the same problem with the MySQL Command Line Tool when I run it on Windows, Mac OS X, and Ubuntu.

3
I suggest using the HEX function to find out what bytes are actually stored in the column. SELECT symbol_name, HEX(symbol_bytes) FROM ... For the Venus unicode character, properly encoded in UTF8, we'd expect E29980.spencer7593

3 Answers

2
votes

Windows cmd and utf8. If you are talking about Windows, then chcp 65001, plus picking the right font is sufficient. See details.

Mojibake. But, on the other hand, if you are complaining about "Mojibake" such as ♀ instead of , then see Mojibake in here. The hex for Venus (aka Female Sign), when correctly stored in utf8 will be E29980. If you see C3A2 E284A2 E282AC, you have "double encoding", not simply Mojibake.

Do not use u'\u2640' anywhere in MySQL.

1
votes

I had the same issue querying Wordpress with the mysql command line program. You can see characters properly output to terminal when using the --default-character-set=utf8mb4 option.

For example

$ mysql --default-character-set=utf8mb4 -D my_database -e "SELECT option_value FROM wp_options WHERE option_name = 'myoption'"

If you want to configure this for the user, simply edit ~/.my.cnf and add the following to the [client] section:

[client]
default-character-set = utf8mb4

Everything works for us this way.

1
votes

I am somewhat embarrassed to report that there never was a problem with the MySQL command-line tool displaying Unicode characters. Why did I think there was?

I wrote a number of Python 2 programs using MySQLdb to communicate with MySQL. My data involved Unicode characters such as the symbols for Mars and Venus. I was able to write these Unicode characters to the database, read them back, and, in general, operate on them just like any other characters.

There was one annoyance: Using the MySQL command-line tool, when I SELECTed rows from tables containing symbols like Mars and Venus, I only saw junk. That is what led me to my original post asking how I could get Unicode to display properly. I never got a satisfactory answer.

Recently I began converting the Python 2 programs to Python 3, using pymysql to communicate with MySQL. Immediately, I ran into problems. The Unicode characters I was reading from the database seemed all wrong. Investigations showed that, in fact, the bytes stored in the database (created with Python 2) did NOT form the correct utf8 sequences for the Unicode characters I was using.

I converted the Python 2 program which created the tables to Python 3, recreated the tables, and, presto, changeo, everything worked. In other words, characters in the database had been wrong from day One, but when read by a Python 2 program, the original Unicode characters were recreated properly.

And, of course, suddenly, the MySQL command-line tool began displaying Unicode characters just fine. The problem had been that the bytes in the database created by Python 2 and MySQLdb were not proper utf8 representations of the characters I was storing. I do not know exactly what the bytes were, and I have been dealing with this issue too long to spend time trying to find out.

For anyone working with Unicode in MySQL, I recommend this article. It shows all the MySQL parameters which must be set up for Unicode, and it shows how you can view the parameters on your own MySQL installation.