0
votes

I have a 4 machine mysql cluster running mariadb 10.0.21. I have two masters setup, that are slaving off of each other. And then two read-only slaves that are replicating from the 1st and 2nd db’s respectively.

In short, the way its setup is like this:

db1 replicates -> db2
db1 replicates -> db3
db2 replicates -> db1
db2 replicates -> db4

It's a pretty basic 4 db setup.

The problem I'm having is that I setup a nagios user to monitor the cluster on the 1st db.

I checked and I could log into the first 3 databases but not the 4th one with the nagios user. I could do that without creating the nagios user on each db because I'm replicating my mysql database on the first two database machines.

But for some reason the nagios user never was created on db4 the way it was on db2 even tho replication seems fine on all nodes.

As you probably recall, I have db 2 replicating to the 4th db.

And if I do a show master status on db 2, I can see that I'm replicating the mysql db:

MariaDB [mysql]> show master status;
+--------------------+----------+------------------------------+------------------+
| File               | Position | Binlog_Do_DB                 | Binlog_Ignore_DB |
+--------------------+----------+------------------------------+------------------+
| mariadb-bin.000078 |      376 | jfwiki,jokefire,bacula,mysql |                  |
+--------------------+----------+------------------------------+------------------+
1 row in set (0.00 sec)

If I check that the nagios user is there on the 2nd db, it is.

 MariaDB [mysql]> select User,Host,Password from user where user like 'nagios';
    +--------+-------------+-------------------------------------------+
    | User   | Host        | Password                                  |
    +--------+-------------+-------------------------------------------+
    | nagios | 52.4.204.96 | *somepasswordhash                         |
    +--------+-------------+-------------------------------------------+
    1 row in set (0.00 sec)

This user was not created on db2, but is there because of the replication.

And if I check the slave status on db4, replication seems completely fine:

[root@db4:~] #mysql -e "show slave status\G" | egrep "Slave_IO_State|Master_Host|Slave_IO_Running|Slave_SQL_Running|Last_Errno|Seconds_Behind_Master"
               Slave_IO_State: Waiting for master to send event
                  Master_Host: db2.example.com
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
                   Last_Errno: 0
        Seconds_Behind_Master: 0

And if I check for the presence of the nagios user on db4 the way I did on db2, the user just isn't there:

MariaDB [mysql]> select User,Host from user where user like 'nagios';
Empty set (0.00 sec)

So my question is, why did the nagios user not get replicated to db4 the way it did from db1 -> db2 and from db1 -> db3? Even tho slave replication on db4 seems to be okay? I could log into all those hosts using the nagios user from the monitoring host.

Here's the full output of the slave replication command on db4 in case I missed anything from my grep earlier:

MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: db2.example.com
                  Master_User: jf_slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mariadb-bin.000078
          Read_Master_Log_Pos: 376
               Relay_Log_File: db4-relay-bin.000044
                Relay_Log_Pos: 537
        Relay_Master_Log_File: mariadb-bin.000078
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 376
              Relay_Log_Space: 1121
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: Yes
           Master_SSL_CA_File: /opt/mysql/ca.crt
           Master_SSL_CA_Path:
              Master_SSL_Cert: /opt/mysql/db4.example.com.crt
            Master_SSL_Cipher:
               Master_SSL_Key: /opt/mysql/db4.example.com.key
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 2
               Master_SSL_Crl: /opt/mysql/ca.crt
           Master_SSL_Crlpath:
                   Using_Gtid: No
                  Gtid_IO_Pos:
1 row in set (0.00 sec)

I am seeing some errors in the mariadb logs on db4, however they're no different than the errors I'm seeing on the 1st 3 databases where the nagios user data replicated successfully.

151004 15:34:36 [Note] Error reading relay log event: slave SQL thread was killed
151004 15:34:36 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
151004 15:34:36 [Note] Slave I/O thread killed while reading event
151004 15:34:36 [Note] Slave I/O thread exiting, read up to log 'mariadb-bin.000078', position 376
151004 15:36:47 [Note] Slave SQL thread initialized, starting replication in log 'mariadb-bin.000078' at position 376, relay log './db4-relay-bin.000042' position: 537
151004 15:36:47 [Note] Slave I/O thread: connected to master '[email protected]:3306',replication started in log 'mariadb-bin.000078' at position 376
151007  4:24:12 [Note] Error reading relay log event: slave SQL thread was killed
151007  4:24:12 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
151007  4:24:12 [Note] Slave I/O thread killed while reading event
151007  4:24:12 [Note] Slave I/O thread exiting, read up to log 'mariadb-bin.000078', position 376
151007  4:28:20 [Note] Slave SQL thread initialized, starting replication in log 'mariadb-bin.000078' at position 376, relay log './db4-relay-bin.000043' position: 537
151007  4:28:20 [Note] Slave I/O thread: connected to master '[email protected]:3306',replication started in log 'mariadb-bin.000078' at position 376

So why would there be this database inconsistency, when all the indicators of replication seem okay? Why can't the nagios user log into db4 the way it can on the 1st 3 db's?

Thanks

1
do all your nodes have the same hardware? do all your nodes have different serverids? n also if your loosing connection on db4 that often is not possible that connection was lost during the creation of nagios user? why dont you just create that user on the slave? Its difficult to know exactly why it did not get created on the slave...There is such a thing as slave drift...Percona built a whole tool to resync slaves in mysql...BK435
They're all from the exact same ami on AWS. All identical in terms of OS, memory, disk, etc. I checked the server-ids on all the nodes and they are unique:checking server id on db1 server-id=1 checking server id on db2 server-id=2 checking server id on db3 server-id=3 checking server id on db4 server-id=4 And the reason I'm not just creating the nagios user on db4 is that I've seen this problem before. I use the db farm to run a wiki. And I've noticed that every once in a while, I'll create a page or alter some text. And the changes will go to only one db node.bluethundr
And when I go to check replication, everything OK in the replication setup. I'll, for instance, go into the config of the wiki site, and instead of using the db array, I'll point the config to only one db host at a time until I find the 'lost' page or altered content. At that point I'll usually dump the database from the "good" database and import it onto all the other hosts that are lacking the changes I'm after. So, the reason I'm making a big deal about this nagios user problem, is because I think it illustrates the issue I've been having with lost wiki pages.bluethundr
How would you recommend combatting slave drift without shelling out money to Percona? Thanks!!bluethundr
percona tools to resync slave is free...percona.com/software/mysql-tools/percona-toolkitBK435

1 Answers

0
votes

Because you need to set Log_slave_updates in the CNF file on each server.