1
votes

Current configuration is basic Master->Slave replication. Each night various data import jobs run on master. During this time, slave replication is turned off and traffic is pointed to Slave (so as not to incur the performance bottleneck/penalty impact of the data loading jobs).

When the jobs complete, replication is turned back on (on the slave)..with the intent of pointing traffic back at the slave shortly thereafter (once slave is synced back up). The problem though is that there is a significant performance problem on the Master at this point. Presumably because the slave I/O Thread is working hard to get all the data replicated across from the Master.

As an alternative solution, turned off "just" the 'SQL Thread' on the slave (keep the IO Thread running all the time) so as not to bombard the master (later on .. all at once), once replication is resumed. However, the problem with this approach (apparently) is that now the slave is incurring continuous performance issues whilst the master is running the heavy data loading jobs (since the IO thread is on all the time and moving data across).

So the question is, how can I start/stop replication on the slave (in accordance with my data loading schedule/requirements) without taking a performance hit on either the master or the slave? Seems like you should be able to turn off replication completely and then turn it back on at a later date without impacting the master??

Thanks in advance!

1

1 Answers

0
votes

Can you characterise the load problem? Are you hung up on iowait, or is the network connection between master and slave saturated? How much memory do you have in your servers?

You can improve the latter by slinging a crossover cable between master and slave and routing all the replication traffic over that, keeping it off your LAN.

If the former is the problem, I'd say your best option is to get yourself more and/or faster disks, or even SSD! You could also do your binary logging to separate disks so that it doesn't slow down query activity.

Overall it sounds like you've just got a capacity problem, and you need to provide sufficient resources to deal with the load you're pointing at it - no amount of juggling is going to let you work around that. I've not encountered a config where binary logging is the limiting factor. Any basic RAID config should be handling at least 300M/sec, and for this to be a problem you need to be holding things up for more than say 30 sec at a time, implying that you need to move more than 9Gb of binlogs every night - are you really generating that much data?

Another alternative is to use DRBD for replication - that way the slave doesn't have to run the replicated queries or deal with binlogs at all, though it has other complications.