1
votes

I'm having trouble with resurrecting a master node with Sentinel. Specifically, slaves are promoted properly when the master is lost, but the master upon reboot is never demoted. However, if I restart Sentinel immediately the master node is demoted. Is my configuration bad, or am I missing something basic?

EDIT: Xpost with https://groups.google.com/forum/#!topic/redis-db/4AnGNssqYTw

I setup a few VMs as follows, all with Redis 3.1.999:

192.168.0.101 - Redis Slave
192.168.0.102 - Redis Slave
192.168.0.103 - Redis Master
192.168.0.201 - Sentinel
192.168.0.202 - Sentinel

My Sentinel configuration, for both sentinels:

loglevel verbose
logfile "/tmp/sentinel.log"
sentinel monitor redisA01 192.168.0.101 6379 2
sentinel down-after-milliseconds redisA01 30000
sentinel failover-timeout redisA01 120000

I stop redis on the master node; as expected Sentinel catches it and promotes a slave to master.

3425:X 08 Sep 23:47:43.839 # +sdown master redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:43.896 # +odown master redisA01 192.168.0.103 6379 #quorum 2/2
3425:X 08 Sep 23:47:43.896 # +new-epoch 53
3425:X 08 Sep 23:47:43.896 # +try-failover master redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:43.898 # +vote-for-leader 71de0d8f6250e436e1f76800cbe8cbae56c1be7c 53
3425:X 08 Sep 23:47:43.901 # 192.168.0.201:26379 voted for 71de0d8f6250e436e1f76800cbe8cbae56c1be7c 53
3425:X 08 Sep 23:47:43.975 # +elected-leader master redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:43.976 # +failover-state-select-slave master redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:44.077 # +selected-slave slave 192.168.0.102:6379 192.168.0.102 6379 @ redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:44.078 * +failover-state-send-slaveof-noone slave 192.168.0.102:6379 192.168.0.102 6379 @ redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:44.977 * +failover-state-wait-promotion slave 192.168.0.102:6379 192.168.0.102 6379 @ redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:44.980 - -role-change slave 192.168.0.102:6379 192.168.0.102 6379 @ redisA01 192.168.0.103 6379 new reported role is master
3425:X 08 Sep 23:47:44.981 # +promoted-slave slave 192.168.0.102:6379 192.168.0.102 6379 @ redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:44.981 # +failover-state-reconf-slaves master redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:45.068 * +slave-reconf-sent slave 192.168.0.101:6379 192.168.0.101 6379 @ redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:46.031 * +slave-reconf-inprog slave 192.168.0.101:6379 192.168.0.101 6379 @ redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:46.032 * +slave-reconf-done slave 192.168.0.101:6379 192.168.0.101 6379 @ redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:46.101 # -odown master redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:46.101 # +failover-end master redisA01 192.168.0.103 6379
3425:X 08 Sep 23:47:46.102 # +switch-master redisA01 192.168.0.103 6379 192.168.0.102 6379
3425:X 08 Sep 23:47:46.103 * +slave slave 192.168.0.101:6379 192.168.0.101 6379 @ redisA01 192.168.0.102 6379
3425:X 08 Sep 23:47:46.103 * +slave slave 192.168.0.103:6379 192.168.0.103 6379 @ redisA01 192.168.0.102 6379

I wait a few minutes and restart Redis on the former master node. Unexpectedly (to me) the node is not demoted to slave.

3425:X 08 Sep 23:48:16.105 # +sdown slave 192.168.0.103:6379 192.168.0.103 6379 @ redisA01 192.168.0.102 6379
3425:X 08 Sep 23:50:09.131 # -sdown slave 192.168.0.103:6379 192.168.0.103 6379 @ redisA01 192.168.0.102 6379

After waiting a few more minutes, I restart one of the sentinels. Immediately it detects the dangling former master node and demotes it.

3425:signal-handler (1441758237) Received SIGTERM scheduling shutdown...
...
3670:X 09 Sep 00:23:57.687 # Sentinel ID is 71de0d8f6250e436e1f76800cbe8cbae56c1be7c
3670:X 09 Sep 00:23:57.687 # +monitor master redisA01 192.168.0.102 6379 quorum 2
3670:X 09 Sep 00:23:57.690 - -role-change slave 192.168.0.103:6379 192.168.0.103 6379 @ redisA01 192.168.0.102 6379 new reported role is master
3670:X 09 Sep 00:23:58.708 - Accepted 192.168.0.201:49731
3670:X 09 Sep 00:24:07.778 * +convert-to-slave slave 192.168.0.103:6379 192.168.0.103 6379 @ redisA01 192.168.0.102 6379
3670:X 09 Sep 00:24:17.801 - +role-change slave 192.168.0.103:6379 192.168.0.103 6379 @ redisA01 192.168.0.102 6379 new reported role is slave
1

1 Answers

1
votes

I would check for multiple processes on the master, and for possible circular replication. I you look at the end of the first log batch you will see it detects the 103 IP as a slave already via the +slave entry. I would try to look at why upon promotion the new master already shows the old master as a slave.

Upon restart the reconfiguration is happening, according to the logs provided, before slave rediscovery whereupon it detects the slave reporting itself as master.

Try it again, it directly interrogate each node before restarting sentinel to see what they each have for master and slaves. That might illuminate the underlying issue.

Edit: your sentinel configuration described is incorrect. You list the master as 103 in your listing, but the sentinel config file you posted indicates 101, which is a slave according to your listing.

Also, add a third sentinel. Two makes it easy to have split brain, which you may well what you are seeing.