1
votes

Is there a way to upgrade from Aurora 1 (MySQL 5.6) to Aurora 2 (MySQL 5.7) without downtime on an active database? This seems like a simple task given we should be able to simply do major version upgrades from either the CLI or the Console, but that is not the case.

We tried:

  1. Creating a snapshot of the database
  2. Creating a new cluster using Aurora 2 (MySQL 5.7) from the snapshot
  3. Configure replication to the new cluster from the primary cluster

However, because you can't run commands that require SUPER user privileges in Aurora you're not able to stop transactions long enough to get a good binlog pointer from the master, which results in a ton of SQL errors that are impossible to skip on an active database.

Also, because Aurora is not doing binlog replication to its Read replicas I can't necessarily stop replication to that read replica and get the pointer.

I have seen this semi-related question, but it certainly requires downtime: How to upgrade AWS RDS Aurora MySQL 5.6 to 5.7

2
We have since implemented a maintenance window and completed the upgrade while offline, which is unfortunate. It seems RDS Aurora would have a straight forward path for doing this?Ryan Eastabrook

2 Answers

2
votes

UPDATE: AWS just announced in-place upgrade option available for 5.6 > 5.7: https://aws.amazon.com/about-aws/whats-new/2021/01/amazon-aurora-supports-in-place-upgrades-mysql-5-6-to-5-7/

Simple as Modify and choose version with 2.x. :)

I tested this Aurora MySQL 5.6 > 5.7 on a 25Gb db, many minor versions behind and it took 10 min, with 8 min of downtime. Not zero downtime, but a very easy option, and it can be scheduled in AWS to happen automatically during off-peak times (maintenance window).

Additionally consider RDS Proxy to reduce downtime. During small windows of db unavailable time (eg. reboot for minor updates), the proxy will hold connections open, instead of completely unavailable, simply appearing as a brief delay/latency, only.

0
votes

When you create a new Aurora cluster from a snapshot, you get a binlog pointer in the error log from the point at which the snapshot was taken. You can use that to set up replication from the old cluster to the new cluster.

I've followed a similar process to what you've described in your question (multiple times in fact) and was able to keep the actual downtime in the low seconds range.