1
votes

We realized that setting the charset on MySQL 5.7.20 to utf8 doesn't actually store all unicode/emoji characters. We are testing migrating the charset to utf8mb4. In order to test we reviewed the schema before and after the migration for inconsistencies. For instance before migration a dump of db/schema.rb before migration using rake db:schema:dump appears as follows (this is just a snippet):

create_table "events", id: :bigint, unsigned: true, force: :cascade, options: "ENGINE=InnoDB DEFAULT CHARSET=utf8" do |t|
  t.string "event_type"
  t.string "system"

We then migrate with the following commands in a Rails migration:

execute "ALTER TABLE events CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
execute "ALTER TABLE events CHANGE event_type event_type VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
execute "ALTER TABLE events CHANGE system system VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"

And then after migration we see the same snippet as follows:

create_table "events", id: :bigint, unsigned: true, force: :cascade, options: "ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci" do |t|
  t.string "event_type", collation: "utf8_general_ci"
  t.string "system", limit: 191

The new version of the column "system" looks correct, with the adjusted max length and collation apparently now defaulting to utf8mb4_unicode_ci as specified. But "event_type" appears not to have migrated, with no adjustment to max length and the collation still set to utf8.

The migration commands are identical and no errors were shown during the migration.

Why the difference? We migrated many varchar columns, and some came out each way with no apparent pattern. Both columns are indexes.

Is the unchanged event_type max length really a problem as long as we don't actually store anything > 191 chars in it?

We are using Rails 5.1.3 and Ruby 2.4.1.

1

1 Answers

0
votes

Setting table row format to dynamic fixes index issues.