0
votes

I have a live production system with a google CloudSQL Postgres instance. The application will soon be undergoing a long running database schema modification to accommodate a change to the way the business operates. We've got a deployment plan that will allow the business to continue to operate during the schema change which essentially pauses replication to our read replica, and queues up API requests that would mutate the database for replay after the schema change is complete. Once the deployment is complete, the last step is to un-pause replication. But while the read replica is catching up, the schema changes will lock tables causing a lot of failing read requests. So before we un-pause the the read replication, we're going to divert all API db queries to the main instance which will have just finished the schema changes. So far so good, but I can't find a way to programmatically tell when the read replica is done catching up, so we can split our DB queries with writes going to the main instance and reads going to the replica.

Is there a PubSub topic or metric stream our application could subscribe to which would fire when replication catches up? I would also be happy with something that reports replication lag transaction count (or time) which the application could receive and when the trailing average comes below threshold, it switches over to reading from the replica again. The least desirable but still okay option would be continuous polling of an API or metric stream.

I know I can do this directly by querying the replica database itself for replication status, but that means we have to implement custom traffic directing in our application. Currently the framework we use allows us to route DB traffic in config. I know there should be metrics that are available from CloudSQL, but I cannot find them.

1

1 Answers

0
votes

I know it's not fully answer your question, but maybe you will be able to use it. Seems that you might be interested in Cloud Monitoring and metric:

database/mysql/replication/seconds_behind_master

According to the reference it reflects the lag of the replica behind the master.

Either that or database/replication/replica_lag should work. I don't think you can board this through pub/sub. Anyway you should take a look at the reference as it contains all metrics.