Redis - after network error, all redis-servers are setted like slave and never master is eleteced anymore because -failover-abort-no-good-slave

Question

environement two redis-server five sentinel Red Hat Enterprise Linux Server release 5.11 (Tikanga) Redis server v=3.0.5 sha=00000000:0 malloc=jemalloc-3.6.0 bits=64 build=d23f872bbf615c9

due to a network error, all machines were isolated and could not be seen for a few seconds

master_log: 576:M 10 Oct 21:56:15.082 # Connection with slave client id #17278 lost. 576:S 10 Oct 21:56:26.044 * SLAVE OF 10.25.144.88:6379 enabled (user request from 'id=1956135 addr=10.25.144.42:50550 fd=1298 name=sentinel-e9e5b26c-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=rw cmd=exec')

slave_log: 4159:M 10 Oct 21:56:15.080 # Connection with master lost. 4159:M 10 Oct 21:56:15.080 * MASTER MODE enabled (user request from 'id=76394 addr=10.25.144.42:35032 fd=9 name=sentinel-e9e5b26c-cmd age=97297 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=14 qbuf-free=32754 obl=36 oll=0 omem=0 events=rw cmd=exec')4159:M 10 Oct 21:56:15.082 # CONFIG REWRITE executed with success.

sentinel2_log :25831:X 10 Oct 21:56:26.124 * +convert-to-slave slave 10.25.144.88:6379 10.25.144.88 6379 @ coremaster 10.25.144.87 6379

slave_log: 4159:S 10 Oct 21:56:26.128 * SLAVE OF 10.25.144.87:6379 enabled (user request from 'id=91945 addr=10.25.144.79:48233 fd=6 name=sentinel-00e48109-cmd age=11idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=rw cmd=exec')

master and slave are slaves, all new vote-for-leader ends with failover-abort-no-good-slave

30120:X 10 Oct 22:03:22.011 # +new-epoch 4491 30120:X 10 Oct 22:03:22.011 # +try-failover master coremaster 10.25.144.87 6379 30120:X 10 Oct 22:03:22.030 # +vote-for-leader 8bf8389ca5d9eb8c1bfde2d5621a639028aeae9e 4491 30120:X 10 Oct 22:03:22.036 # 10.25.144.79:26379 voted for 8bf8389ca5d9eb8c1bfde2d5621a639028aeae9e 4491
30120:X 10 Oct 22:03:22.044 # 10.25.144.87:26379 voted for 8bf8389ca5d9eb8c1bfde2d5621a639028aeae9e 4491
30120:X 10 Oct 22:03:22.048 # 10.25.144.88:26379 voted for 8bf8389ca5d9eb8c1bfde2d5621a639028aeae9e 4491
30120:X 10 Oct 22:03:22.054 # 10.25.144.80:26379 voted for 8bf8389ca5d9eb8c1bfde2d5621a639028aeae9e 4491
30120:X 10 Oct 22:03:22.092 # +elected-leader master coremaster 10.25.144.87 6379 30120:X 10 Oct 22:03:22.092 # +failover-state-select-slave master coremaster 10.25.144.87 6379 30120:X 10 Oct 22:03:22.192 # -failover-abort-no-good-slave master coremaster 10.25.144.87 6379

first everything is ok, master converts to slave and slave change to master, but there is a sentinel (sentinel2_log) that must believe that the master is up, and it indicates to the slave that it returns to be slave. In the end the two redis-servers are slaves, and no master is elected.

the config file is almost the default one

tcp-keepalive 0 stop-writes-on-bgsave-eror no

I do not know why this has happened and how to fix it. any idea will be grateful, thnaks.

Jose Miguel Arigita Jose Miguel Arigita · Accepted Answer · 2017-10-25T12:01:27

Sorry, checking the installation we realized that there were other group of sentinels monitoring the master, from older version not removed.

So with the data exposed in the question everything is ok, there is no problem.

But , because of an instalation error, there are two groups of sentinels with different name in sentinel monitor "sentinel monitor coremaster ip 6382 3"

each group of sentinels command a different thing, and the result is that all nodes are slave and no master.

Redis - after network error, all redis-servers are setted like slave and never master is eleteced anymore because -failover-abort-no-good-slave

1 Answers