4
votes

I have simple Master/Slave configuration. I have 8GB of RAM on both my production boxes. I was using Master for Writes only and slave for Reads only. But over the weekend I ran one job which was to insert data on master which should be replicated to slave. Due to which my slave lang behind the master for almost 15-16 hours and it caused a big trouble for my reports as I was reading it from slave and slave was not having updated information.

With respect to this I have few queries:

  1. Are there any justified reasons why to should use slave for reads and not master.(My master has writes after very 5 mins.) and some jobs are schedule for reads from slave.

  2. I have 100GB table and everyday I have million record insertion on the same table. All selects and inserts happen on this table. I have opted for the way of segregating data year wise from this table to multiple tables in order to optimize this table is there any other way i can obtain to optimize and make execution of this table faster.

Please let me know if I have left anything unclear.

Below is the table design:

+----------------+------------------+------+-----+---------------------+----------------+
| Field          | Type             | Null | Key | Default             | Extra          |
+----------------+------------------+------+-----+---------------------+----------------+
| test_id        | int(11) unsigned | NO   | PRI | NULL                | auto_increment |
| prime_id       | int(11) unsigned | NO   | MUL | 0                   |                |
| prime2_id      | int(11) unsigned | NO   | MUL | 0                   |                |
| timestamp      | datetime         | NO   | MUL | 0000-00-00 00:00:00 |                |
| test_time      | int(11)          | NO   |     | 0                   |                |
| status         | int(11)          | NO   |     | 0                   |                |
| component      | int(11) unsigned | NO   |     | 0                   |                |
| c_component    | int(11) unsigned | NO   |     | 0                   |                |
| C2_component   | int(11) unsigned | NO   |     | 0                   |                |
| C3_component   | int(11) unsigned | NO   |     | 0                   |                |
| rt_component   | int(11) unsigned | NO   |     | 0                   |                |
| code           | int(11) unsigned | NO   |     | 0                   |                |
| ip             | int(11) unsigned | YES  |     | 0                   |                |
| step_id        | int(11) unsigned | YES  |     | NULL                |                |
+----------------+------------------+------+-----+---------------------+----------------+
This is the index information of the table:

| Table | Non_unique | Key_name              | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-----------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tests |          0 | PRIMARY               |            1 | test_id     | A         |   629448388 |     NULL | NULL   |      | BTREE      |         |
| tests |          1 | ixf_prime_id          |            1 | prime_id    | A         |          14 |     NULL | NULL   |      | BTREE      |         |
| tests |          1 | ixf_prime2_id         |            1 | prime2_id   | A         |          14 |     NULL | NULL   |      | BTREE      |         |
| tests |          1 | ix_timestamp          |            1 | timestamp   | A         |   157362097 |     NULL | NULL   |      | BTREE      |         |
| tests |          1 | ix_prime_id_timestamp |            1 | prime_id    | A         |          14 |     NULL | NULL   |      | BTREE      |         |
| tests |          1 | ix_prime_id_timestamp |            2 | timestamp   | A         |   629448388 |     NULL | NULL   |      | BTREE      |         |
1

1 Answers

2
votes

We had a similar situation to this, but our slave lagged 3 or 4 days behind the master sometimes, and at others was completely up-to-date.

What we did to address this was to test the slave status at the top of each page generated (or script for scheduled jobs) and if the "seconds behind master" was greater than some arbitrary amount we decided on, we fired all queries for that page / job at the master. If the seconds behind master was within our allowed time limit (often zero), we then knew it was safe to fire the queries at the slave.

This was then expanded to decide which slave to fire the queries at when we had more than one (sort of a software load balancer!).

Eventually, we redesigned the architecture and the insert queries to ensure than the slave lag ended up being a very minor issue...

One thing you could do is to try to chunk up your inserts into smaller batches so a single insert doesn't take too long, allowing the slave to start that insert whilst the master is busy on the next one.

Hope this helps.

Dave