0
votes

I have an architecture with three Redis instances (one master and two slaves) and three Sentinel instances. In front of it there is a HaProxy. All works well until the master Redis instance goes down. The new master is properly chosen by Sentinel. However, the old master (which is now down) is not reconfigured to be a slave. As a result, when that instance is up again I have two masters for a short period of time (about 11 seconds). After that time that instance which was brought up is properly downgraded to slave.

Shouldn't it work that way, that when the master goes down it is downgraded to slave straight away? Having that, when it was up again, it would be slave immediately. I know that (since Redis 2.8?) there is that CONFIG REWRITE functionality so the config cannot be modified when the Redis instance is down.

Having two masters for some time is a problem for me because the HaProxy for that short period of time instead of sending requests to one master Redis, it does the load balancing between those two masters.

Is there any way to downgrade the failed master to slave immediately?

Obviously, I changed the Sentinel timeouts.

Here are some logs from Sentinel and Redis instances after the master goes down:

Sentinel

81358:X 23 Jan 22:12:03.088 # +sdown master redis-ha 127.0.0.1                       63797.0.0.1 26381 @ redis-ha 127.0.0.1 6379
81358:X 23 Jan 22:12:03.149 # +new-epoch 1
81358:X 23 Jan 22:12:03.149 # +vote-for-leader 6b5b5882443a1d738ab6849ecf4bc6b9b32ec142 1
81358:X 23 Jan 22:12:03.174 # +odown master redis-ha 127.0.0.1 6379 #quorum 3/2
81358:X 23 Jan 22:12:03.174 # Next failover delay: I will not start a failover before Sat Jan 23 22:12:09 2016
81358:X 23 Jan 22:12:04.265 # +config-update-from sentinel 127.0.0.1:26381 127.0.0.1 26381 @ redis-ha 127.0.0.1 6379
81358:X 23 Jan 22:12:04.265 # +switch-master redis-ha 127.0.0.1 6379 127.0.0.1 6381
81358:X 23 Jan 22:12:04.266 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ redis-ha 127.0.0.1 6381
81358:X 23 Jan 22:12:04.266 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ redis-ha 127.0.0.1 6381
81358:X 23 Jan 22:12:06.297 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ redis-ha 127.0.0.1 6381

Redis

81354:S 23 Jan 22:12:03.341 * MASTER <-> SLAVE sync started
81354:S 23 Jan 22:12:03.341 # Error condition on socket for SYNC: Connection refused
81354:S 23 Jan 22:12:04.265 * Discarding previously cached master state.
81354:S 23 Jan 22:12:04.265 * SLAVE OF 127.0.0.1:6381 enabled (user request from 'id=7 addr=127.0.0.1:57784 fd=10 name=sentinel-6b5b5882-cmd age=425 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=14 qbuf-free=32754 obl=36 oll=0 omem=0 events=rw cmd=exec')
81354:S 23 Jan 22:12:04.265 # CONFIG REWRITE executed with success.
81354:S 23 Jan 22:12:04.371 * Connecting to MASTER 127.0.0.1:6381
81354:S 23 Jan 22:12:04.371 * MASTER <-> SLAVE sync started
81354:S 23 Jan 22:12:04.371 * Non blocking connect for SYNC fired the event.
81354:S 23 Jan 22:12:04.371 * Master replied to PING, replication can continue...
81354:S 23 Jan 22:12:04.371 * Partial resynchronization not possible (no cached master)
81354:S 23 Jan 22:12:04.372 * Full resync from master: 07b3c8f64bbb9076d7e97799a53b8b290ecf470b:1
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: receiving 860 bytes from master
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Flushing old data
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Loading DB in memory
81354:S 23 Jan 22:12:04.467 * MASTER <-> SLAVE sync: Finished with success
3
This isn't a programming question but a system/server admin one. Serverfault is where to ask this.The Real Bill
True, indeed. I'm moving it to Serverfault.Damian
@Damian did you find a solution to your problem. I am also facing the same issue.akipro

3 Answers

2
votes

I was also getting the same error when I want to switch master in redis-cluster using sentinel.

+vote-for-leader xxxxxxxxxxxxxxxxxxxxxxxx8989 10495
Next failover delay: I will not start a failover before Fri Aug 2 23:23:44 2019

After resetting sentinel. Cluster works as expected

SENTINEL RESET *

or

SENTINEL RESET mymaster

Run above command in all sentinel server.

0
votes

In the event a Redis node goes down, when/if it recovers, it will recover with the same role it had prior to going down. The Sentinel cannot reconfigure the node if it is unable to ping it. So, there's a brief period of time between the node coming back up and the Sentinel acknowledging and reconfiguring it. This explains the multi-master state.

If you are set on using Haproxy, one workaround would be to reconfigure the Redis node's role prior to starting the process. Redis will boot as a slave as long as there's a SLAVEOF entry in the redis.conf. The primary issue with this workaround is that it doesn't solve network partition scenarios.

Hope that helps.

0
votes

If using HAProxy you can try to query the uptime_in_seconds something like this:

backend redis
    mode tcp
    balance first
    timeout queue 5s
    default-server check inter 1s fall 2 rise 2 maxconn 100
    option tcp-check
    tcp-check connect
    tcp-check send AUTH\ <secret>\r\n
    tcp-check expect string +OK
    tcp-check send PING\r\n
    tcp-check expect string +PONG
    tcp-check send info\ replication\r\n
    tcp-check expect string role:master
    tcp-check send info\ server\r\n
    tcp-check expect rstring uptime_in_seconds:\d{2,}
    tcp-check send QUIT\r\n
    tcp-check expect string +OK
    server redis-1 10.0.0.10:9736
    server redis-2 10.0.0.20:9736
    server redis-3 10.0.0.30:9736

Notice the:

  tcp-check expect rstring uptime_in_seconds:\d{2,}

if uptime is not > 10 seconds, the node will not be added