0
votes

We have a number of heavy users of our main analytics database, who want to not only read data, but also create large tables of derived data. This is starting to become prohibitive in terms of cost and the database slows down unacceptably when there are too many people writing to large tables at the same time.

My intended solution is to create a number of read/write replicas, such that our core data is replicated out from a master to a number of replicas with writing enabled, and the users can each work on one of the replicas.

However, my fear is that this doesn't seem to be the intended use case for enabling write on replicas. My concern is that a user will build up a lot of valuable data on their replica. What can I do if the replica crashes?

With a read-only replica, of course you can just re-create the replica and everything is fine.

But if your replica has data you care about on it, separate from the master, things are more difficult. You can't just spin up a restore of the replica and re-attach it the master, because you can't attach existing replicas, you can only spin up new ones.

Is there an architectural solution for this on AWS? Perhaps there is a way of attaching an existing replica after all?

3

3 Answers

2
votes

You cannot re-attach a read-replica back to the original master.

You cannot setup read-write replicas with Amazon RDS.

Note: There is an article from Amazon on enabling writing to a MySQL read replica. I have never tested this, so I do not know the implications.

How do I configure my Amazon RDS DB instance read replica to be modifiable?

When you break a read-replica from the master, the read-replica becomes a master. To protect your data, setup scheduled backups just like you would on the original master.

Since your problem is write performance, you have two options. Scale-up the instance size or increase the Provisioned IOPS. Use CloudWatch to create metrics to determine what performance areas are the issue (CPU, Memory, Disk I/O). Another possible choice might be to switch to Aurora.

2
votes

What can I do if the replica crashes?

RDS Replicas now support Multi-AZ.

https://aws.amazon.com/about-aws/whats-new/2018/01/amazon-rds-read-replicas-now-support-multi-az-deployments/

Multi-AZ gives you two instances, each with its own EBS volume, in two AZs, only one of them accessible at any one time and the other one sitting idle as a hot standby. When a failure occurs, the backup instance takes over and the DNS hostname for the instance switches from one to the other.

The actual implementation of Multi-AZ is not publicly documented, but it is said that replication is synchronous. The only way that seems possible is if the replication is storage-level replication rather than logical (binlog) replication, and there are various observations you can make which bear this out. It appears that the active instance actually writes to both volumes and the MySQL daemon on the backup instance is not running. When the failover event occurs, the server daemon on the backup is started up and goes through standard MySQL crash recovery.

Enabling Multi-AZ should address the question of what happens in the event of a crash... depending on your definition of "crash."

Replicas can have daily backups and snapshots and can be recovered with point-in-time recovery just like a standalone or master instance... Point-in-time recovery of a DB instance in RDS never modifies the instance being "recovered" -- it creates a new one from a snapshot then rolls it forward using the binlogs.

...but in this case, of course, the "recovered" instance would be a different instance and would no longer be an RDS replica.

What you would need to do in that case would be to recover the failed instance to a point in time, and then create a new replica, and then dump and load the data from the recovered instance onto its replacement -- but only those tables that are not present on the master -- the tables that are unique to the writable replica.

As a point of clarification, MySQL native replication has no problem with tables existing on a replica but not on the master. MySQL replication does have a problem with tables that exist on both master and replica but with different data in the tables -- that's an unsupported configuration, so any plan to make a replica writable must require that tables coming from the master not be changed (with a few exceptions -- notably, additional non-unique indexes can safely be added to tables on a replica for query optimization purposes) -- otherwise, replication will be broken and no further replication events can execute on the replica.

If replication fails due to misuse of the replica (e.g. dropping or changing a table that the master subsequently modifies) it is still a replica as far as RDS is concerned, just a broken one, and can be restored to normal operation including RDS replication... but this is a delicate operation requiring a low-level understanding of MySQL native replication. The gist of such a fix is that the relevant data in the replica's data set must be modified such that it is identical to the data as it existed on the master immediately after the failing replication event executed. Once a replica's data is in this state, replication can be kick-started and will pick up where it left off, eventually catching back up to real-time again.

A note of caution with writable replicas is that if replication fails due to such a condition, you do need to either repair it or destroy it or promote the replica to become its own independent master, which permanently decouples it from its original master -- an operation which cannot be undone. The reason a broken replica must be dealt with reasonably promptly is that RDS has protections that prevent the master from purging its binlogs until no managed replica has further need of them, which could cause them to back up on the master, consuming storage space there, or to pile up saved but unexecuted on the broken replica, consuming space there. The latter condition is more likely, but the former is not impossible to encounter.

As a last resort, and thoroughly unsanctioned, it is possible to configure an RDS instance that is not an RDS replica (e.g. after it has been promoted to master) to connect to another RDS instance and replicate from it, using the same steps that are designed for migrating from on-premise servers onto RDS, with mysql.rds_set_external_master. This gives you RDS-to-RDS replication that RDS doesn't actually realize is happening.

1
votes

We have a number of heavy users of our main analytics database, who want to not only read data, but also create large tables of derived data. This is starting to become prohibitive in terms of cost and the database slows down unacceptably when there are too many people writing to large tables at the same time.

To solve this problem, you will need to create a read replica of your master DB (maybe create one for analytics so they can do all the intensive work on it). Let the analytics team do what they need to do on that read replica and they can put the results in another database created specifically for the results, alternatively written to the master db in off-peak hours (should be separate imo)

My intended solution is to create a number of read/write replicas, such that our core data is replicated out from a master to a number of replicas with writing enabled, and the users can each work on one of the replicas.

In all honesty, I don't think this is the best solution to your problem. Keep the master DB for writes only, keep the read replicas so they help with the load on the master DB, and have the analytics team use their own read replica and output the results into another DB. Setting up to many read/write replicas that are multi az will have a significant impact on the latency of write operations on the database.

However, my fear is that this doesn't seem to be the intended use case for enabling write on replicas. My concern is that a user will build up a lot of valuable data on their replica. What can I do if the replica crashes?

Enable multi-az will make sure aws manages a standby of your DB thats ready to be rolled over to in case of an error, however, keep in mind latency will go up significantly if you're doing latency sensitive work you might have to reconsider your architecture.

But if your replica has data you care about on it, separate from the master, things are more difficult. You can't just spin up a restore of the replica and re-attach it the master because you can't attach existing replicas, you can only spin up new ones.

Why would the read replica be out of sync with the master? I mean, yes issues do happen but if they do you have a completely different issue to worry about... just remember you cannot attach a read replica that has been promoted to a master database back to the existing master. You will have to recreate the read replica again.