0
votes

I have an RDS MySql with the following settings:

  • Class: db.m5.xlarge
  • Storage: Prosisionned 1000 IOPS (SSD)

I then want to add a few columns to a table that is about 20 GB in size (according to INFORMATION_SCHEMA.files). Here's my statement:

ALTER TABLE MY_TABLE
ADD COLUMN NEW_COLUMN_1 DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_2  DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_3 INT(10) UNSIGNED NULL,
ADD CONSTRAINT SOME_CONSTRAINT FOREIGN KEY (NEW_COLUMN_3) REFERENCES SOME_OTHER_TABLE(SOME_OTHER_PK),
ADD COLUMN NEW_COLUMN_4 DATE NULL;

This query took 172 minutes to execute. Most of this time was spent coping the data to a temporary table.

During that operation, there were no other queries (read or write) being executed. I had the database just for myself. SHOW FULL PROCESSLIST was saying that State was equal to copy to tmp table for my query.

What I don't understand is that the the AWS RDS Console tells me that the write througput was between 30 MB/s and 35 MB/s for 172 minutes.

enter image description here

Assuming a write througput of 30 MB/s, I should have been able to write 30 * 60 * 172 = 309600 MB = 302 GB. This is much bigger than the size of the temporary table that was created during the opration (20 GB).

So two questions:

  1. what is mysql/rds writing beside my temp table? Is there a way to disable that so that I can get the full bandwidth to create the temp table?
  2. is there any way to accelerate that operation? Taking 3 hours to write 20 GB of data seems pretty long.
2
It might be better to ask this at: dba.stackexchange.com - John Rotenstein

2 Answers

1
votes

I was using MySQL 5.7. According to this MySQL blog post, version 8.0 improved the situation: "InnoDB now supports Instant ADD COLUMN".

I therefore changed my query to use the new feature.

-- Completes in 0.375 seconds!
ALTER TABLE MY_TABLE
ADD COLUMN NEW_COLUMN_1 DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_2  DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_3 INT(10) UNSIGNED NULL,
-- 'ALGORITHM=INSTANT' is not compatible with foreign keys. 
-- The foreign key will need to be added in another statement
-- ADD CONSTRAINT SOME_CONSTRAINT FOREIGN KEY (NEW_COLUMN_3) REFERENCES SOME_OTHER_TABLE(SOME_OTHER_PK),
ADD COLUMN NEW_COLUMN_4 DATE NULL, 
-- the new option
ALGORITHM=INSTANT;

-- This completed in about 6 minutes. 
-- Adding the foreign creates an index under the hood. 
-- This index was 1.5 GB big.
SET FOREIGN_KEY_CHECKS=0;
ALTER TABLE MY_TABLE
  ADD FOREIGN KEY (NEW_COLUMN_3) REFERENCES SOME_OTHER_TABLE(SOME_OTHER_PK);
SET FOREIGN_KEY_CHECKS=1;

So my conclusions:

  • upgrade to MySQL 8 if you can
  • make sure that you always use (when possible) the ALGORITHM=INSTANT option.
0
votes

InnoDB is probably the storage engine you are using, since it's the default storage engine. InnoDB does some I/O that might seem redundant, to ensure there is no data loss.

For example:

  • Data and index pages modified in the buffer pool must be written to the tablespace. The table may need to split some pages during the process of adding columns, because the rows become wider, and fewer rows fit per page.
  • During writing pages to the tablespace, InnoDB first writes those pages to the doublewrite buffer, to ensure against data loss if there's a crash during a page write.
  • Transactions are written to the InnoDB redo log, and this may even result in multiple overwrites to the same block in the log.
  • Transactions are also written to the binary log if it is enabled for purposes of replication. Though this shouldn't be a big cost in the cast of an ALTER TABLE statement, because DDL statements are always written to the binary log in statement format, not in row format.

You also asked what can be done to speed up the ALTER TABLE. The reason to want it to run faster is usually because during an ALTER TABLE, the table is locked and may block concurrent queries.

At my company, we use the free tool pt-online-schema-change, so we can continue to use the table more or less freely while it is being altered. It actually takes longer to complete the alter this way, but it's not so inconvenient since it doesn't block our access to the table.