4
votes

Does anyone have any tips or gotcha moments to look out for when trying to migrate MySQL tables from the the default case-insenstive swedish or ascii charsets to utf-8? Some of the projects that I'm involved in are striving for better internationalization and the database is going to be a significant part of this change.

Before we look to alter the database, we are going to convert each site to use UTF-8 character encoding (from least critical to most) to help ensure all input/output is using the same character set.

Thanks for any help

5

5 Answers

2
votes

Some hints:

  • Your CHAR and VARCHAR columns will use up to 3 times more disk space. (You probably won't get much disk space grow for Swedish words.)
  • Use SET NAMES utf8 before reading or writing to the database. If you don't this then you will get partially garbled characters.
1
votes

Beware index length limitations. If a table is structured, say:

a varchar(255) b varchar(255) key ('a', 'b')

You're going to go past the 1000 byte limit on key lengths. 255+255 is okay, but 255*3 + 255*3 isn't going to work.

0
votes

Your CHAR and VARCHAR columns will use up to 3 times more disk space.

Only if they're stuffed full of latin-1 with ordinals > 128. Otherwise, the increased space use of UTF-8 is minimal.

0
votes

The collations are not always favorable. You'll get umlats collating to non umlatted versions which is not always correct. Might want to go w/ utf8_bin, but then everything is case sensitive as well.