Redis Cluster: No automatic failover for master failure

Question

I am trying to implement a Redis cluster with 6 machine. I have a vagrant cluster of six machines:

192.168.56.101
192.168.56.102
192.168.56.103
192.168.56.104
192.168.56.105
192.168.56.106

all running redis-server

I edited /etc/redis/redis.conf file of all the above servers adding this

cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-slave-validity-factor 0
appendonly yes

I then ran this on one of the six machines;

./redis-trib.rb create --replicas 1 192.168.56.101:6379 192.168.56.102:6379 192.168.56.103:6379 192.168.56.104:6379 192.168.56.105:6379 192.168.56.106:6379

A Redis cluster is up and running. I checked manually by setting value in one machine it shows up on other machine.

$ redis-cli -p 6379 cluster nodes
3c6ffdddfec4e726f29d06a6da550f94d976f859 192.168.56.105:6379 master - 0 1450088598212 5 connected
47d04bc98ab42fc793f9f382855e5c54ab8f2e20 192.168.56.102:6379 slave caf2cec45114dc8f4cbc6d96c6dbb20b62a39f90 0 1450088598716 7 connected
040d4bb6a00569fc44eec05440a5fe0796952ccf 192.168.56.101:6379 myself,slave 5318e48e9ef0fc68d2dc723a336b791fc43e23c8 0 0 4 connected
caf2cec45114dc8f4cbc6d96c6dbb20b62a39f90 192.168.56.104:6379 master - 0 1450088599720 7 connected 0-10922
d78293d0821de3ab3d2bca82b24525e976e7ab63 192.168.56.106:6379 slave 5318e48e9ef0fc68d2dc723a336b791fc43e23c8 0 1450088599316 8 connected
5318e48e9ef0fc68d2dc723a336b791fc43e23c8 192.168.56.103:6379 master - 0 1450088599218 8 connected 10923-16383

My problem is that when I shutdown or stop redis-server on any one machine which is master the whole cluster goes down, but if all the three slaves die the cluster still works properly.

What should I do so that a slave turns a master if a master fails(Fault tolerance)?

I am under the assumption that redis handles all those things and I need not worry about it after deploying the cluster. Am I right or would I have to do thing myself?

Another question is lets say I have six machine of 16GB RAM. How much total data I would be able to handle on this Redis cluster with three masters and three slaves?

Thank you.

Close why? Whats wrong with the question? Some comments would be nice. — Nagri

Asad Asad · Accepted Answer · 2017-01-27T23:47:38

the setting cluster-slave-validity-factor 0 may be the culprit here.

from redis.conf

# A slave of a failing master will avoid to start a failover if its data
# looks too old.

In your setup the slave of the terminated master considers itself unfit to be elected master since the time it last contacted master is greater than the computed value of:

(node-timeout * slave-validity-factor) + repl-ping-slave-period

Therefore, even with a redundant slave, the cluster state is changed to DOWN and becomes unavailable.

You can try with a different value, example, the suggested default

cluster-slave-validity-factor 10

This will ensure that the cluster is able to tolerate one random redis instance failure. (it can be slave or a master instance)

For your second question: Six machines of 16GB RAM each will be able to function as a Redis Cluster of 3 Master instances and 3 Slave instances. So theoretical maximum is 16GB x 3 data. Such a cluster can tolerate a maximum of ONE node failure if cluster-require-full-coverage is turned on. else it may be able to still serve data in the shards that are still available in the functioning instances.

Redis Cluster: No automatic failover for master failure

1 Answers