I have a csv file containing some characters that lie outside Unicode BMP, for example the character ????
. They are SMP characters, so they need to be stored in utf8mb4
charset and utf8mb4_general_ci
collation in MySQL instead of utf8
charset and utf8_general_ci
collation.
So here are my SQL queries.
MariaDB [tweets]> set names 'utf8mb4'; Query OK, 0 rows affected (0.01 sec) MariaDB [tweets]> create table test (a text) collate utf8mb4_general_ci; Query OK, 0 rows affected (0.06 sec) MariaDB [tweets]> insert into test (a) values ('????'); Query OK, 1 row affected (0.03 sec) MariaDB [tweets]> select * from test; +------+ | a | +------+ | ???? | +------+ 1 row in set (0.00 sec)
No warnings. Everything is right. Now I want to load that csv file. For test, the file has only one line.
MariaDB [tweets]> load data local infile 't.csv' into table wzyboy character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n\n' (tweet_id,in_reply_to_status_id,in_reply_to_user_id,retweeted_status_id,retweeted_status_user_id,timestamp,source,text,expanded_urls); Query OK, 1 row affected, 7 warnings (0.01 sec) Records: 1 Deleted: 0 Skipped: 0 Warnings: 7
The warning message is:
| Warning | 1366 | Incorrect string value: '\xF0\x9F\x80\x80' for column 'text' at row 1 |
All my working environments (OS, Termianl, etc) use UTF-8. I have specified utf8mb4
in everyplace I could think up of, and if I manually INSERT INTO
it works just fine. However, when I use LOAD DATA INFILE [...] CHARACTER SET utf8mb4 [...]
it just fails with error "Incorrect string value".