Redis Sentinel manual failover command timesout
I have one Redis master, one slave, and one Sentinel monitoring them. Everything seems to be working properly including failover when the master is killed. But when I issue the SENTINEL FAILVER command Sentinel gets stuck in the state +failover-state-wait-promotion for a few minutes. It seems like the Slave is not getting the promotion command. This doesn't make sense because there doesn't seem to be any trouble with network communication from the Sentinel host to either of the Redis hosts. I'm running all 3 of the procs in Docker containers, but I'm not sure how that could cause the problem. I can run redis-cli from the Sentinel host (i.e. from inside the Docker container) and can remotely execute the slaveof command. I can also monitor both Redis instances and see SENTINEL pings and info requests. I looked at logs for the master and slave and see nothing abnormal. Looking at THIS post and there does not seem to be any reason why Sentinel would consider the Redis instances invalid.
I'm fairly experienced with Sentinel, but rather new to Docker. Not sure how to proceed determining what the problem is. Any ideas?
Sentinel Log
[8] 01 Jul 01:36:57.317 # Sentinel runid is c337f6f0dfa1d41357338591cd0181c07cb026d0
[8] 01 Jul 01:38:13.135 # +monitor master redis-holt-overflow 10.19.8.2 6380 quorum 1
[8] 01 Jul 01:38:13.135 # +set master redis-holt-overflow 10.19.8.2 6380 down-after-milliseconds 3100
[8] 01 Jul 01:38:13.199 * +slave slave 10.19.8.3:6381 10.19.8.3 6381 @ redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.288 # Executing user requested FAILOVER of 'redis-holt-overflow'
[8] 01 Jul 01:38:42.288 # +new-epoch 1
[8] 01 Jul 01:38:42.288 # +try-failover master redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.352 # +vote-for-leader c337f6f0dfa1d41357338591cd0181c07cb026d0 1
[8] 01 Jul 01:38:42.352 # +elected-leader master redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.352 # +failover-state-select-slave master redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.404 # +selected-slave slave 10.19.8.3:6381 10.19.8.3 6381 @ redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.404 * +failover-state-send-slaveof-noone slave 10.19.8.3:6381 10.19.8.3 6381 @ redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:38:42.488 * +failover-state-wait-promotion slave 10.19.8.3:6381 10.19.8.3 6381 @ redis-holt-overflow 10.19.8.2 6380
[8] 01 Jul 01:41:42.565 # -failover-abort-slave-timeout master redis-holt-overflow 10.19.8.2 6380
Redis Master Log
[17] 01 Jul 01:13:58.251 # Server started, Redis version 2.8.21
[17] 01 Jul 01:13:58.252 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[17] 01 Jul 01:13:58.252 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[17] 01 Jul 01:13:58.252 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
[17] 01 Jul 01:13:58.252 * DB loaded from disk: 0.000 seconds
[17] 01 Jul 01:13:58.252 * The server is now ready to accept connections on port 6380
[17] 01 Jul 01:34:45.796 * Slave 10.196.88.30:6381 asks for synchronization
[17] 01 Jul 01:34:45.796 * Full resync requested by slave 10.196.88.30:6381
[17] 01 Jul 01:34:45.796 * Starting BGSAVE for SYNC with target: disk
[17] 01 Jul 01:34:45.797 * Background saving started by pid 20
[20] 01 Jul 01:34:45.798 * DB saved on disk
[20] 01 Jul 01:34:45.799 * RDB: 0 MB of memory used by copy-on-write
[17] 01 Jul 01:34:45.808 * Background saving terminated with success
[17] 01 Jul 01:34:45.808 * Synchronization with slave 10.196.88.30:6381 succeeded
[17] 01 Jul 01:38:42.343 # Connection with slave 10.196.88.30:6381 lost.
[17] 01 Jul 01:38:43.275 * Slave 10.196.88.30:6381 asks for synchronization
[17] 01 Jul 01:38:43.275 * Full resync requested by slave 10.196.88.30:6381
[17] 01 Jul 01:38:43.275 * Starting BGSAVE for SYNC with target: disk
[17] 01 Jul 01:38:43.275 * Background saving started by pid 21
[21] 01 Jul 01:38:43.277 * DB saved on disk
[21] 01 Jul 01:38:43.277 * RDB: 0 MB of memory used by copy-on-write
[17] 01 Jul 01:38:43.368 * Background saving terminated with success
[17] 01 Jul 01:38:43.368 * Synchronization with slave 10.196.88.30:6381 succeeded
Redis Slave Log
[14] 01 Jul 01:15:51.435 # Server started, Redis version 2.8.21
[14] 01 Jul 01:15:51.435 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[14] 01 Jul 01:15:51.435 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[14] 01 Jul 01:15:51.435 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
[14] 01 Jul 01:15:51.435 * DB loaded from disk: 0.000 seconds
[14] 01 Jul 01:15:51.435 * The server is now ready to accept connections on port 6381
[14] 01 Jul 01:34:45.088 * SLAVE OF 10.196.88.29:6380 enabled (user request)
[14] 01 Jul 01:34:45.947 * Connecting to MASTER 10.196.88.29:6380
[14] 01 Jul 01:34:45.947 * MASTER <-> SLAVE sync started
[14] 01 Jul 01:34:45.948 * Non blocking connect for SYNC fired the event.
[14] 01 Jul 01:34:45.948 * Master replied to PING, replication can continue...
[14] 01 Jul 01:34:45.948 * Partial resynchronization not possible (no cached master)
[14] 01 Jul 01:34:45.948 * Full resync from master: b912b647401917d52742c0eac3ae2f795f59f48f:1
[14] 01 Jul 01:34:45.960 * MASTER <-> SLAVE sync: receiving 18 bytes from master
[14] 01 Jul 01:34:45.960 * MASTER <-> SLAVE sync: Flushing old data
[14] 01 Jul 01:34:45.960 * MASTER <-> SLAVE sync: Loading DB in memory
[14] 01 Jul 01:34:45.960 * MASTER <-> SLAVE sync: Finished with success
[14] 01 Jul 01:38:42.495 # Connection with master lost.
[14] 01 Jul 01:38:42.495 * Caching the disconnected master state.
[14] 01 Jul 01:38:42.495 * Discarding previously cached master state.
[14] 01 Jul 01:38:42.495 * MASTER MODE enabled (user request)
[14] 01 Jul 01:38:42.495 # CONFIG REWRITE executed with success.
[14] 01 Jul 01:38:42.506 * SLAVE OF 10.196.88.29:6380 enabled (user request)
[14] 01 Jul 01:38:43.425 * Connecting to MASTER 10.196.88.29:6380
[14] 01 Jul 01:38:43.426 * MASTER <-> SLAVE sync started
[14] 01 Jul 01:38:43.426 * Non blocking connect for SYNC fired the event.
[14] 01 Jul 01:38:43.427 * Master replied to PING, replication can continue...
[14] 01 Jul 01:38:43.427 * Partial resynchronization not possible (no cached master)
[14] 01 Jul 01:38:43.427 * Full resync from master: b912b647401917d52742c0eac3ae2f795f59f48f:10930
[14] 01 Jul 01:38:43.520 * MASTER <-> SLAVE sync: receiving 18 bytes from master
[14] 01 Jul 01:38:43.520 * MASTER <-> SLAVE sync: Flushing old data
[14] 01 Jul 01:38:43.520 * MASTER <-> SLAVE sync: Loading DB in memory
[14] 01 Jul 01:38:43.520 * MASTER <-> SLAVE sync: Finished with success
Sentinel Config
port 26379
pidfile "/var/run/redis-sentinel.pid"
logfile ""
daemonize noGenerated by CONFIG REWRITE
dir "/data"
sentinel monitor redis-holt-overflow 10.19.8.2 6380 1
sentinel down-after-milliseconds redis-holt-overflow 3100
sentinel config-epoch redis-holt-overflow 0
sentinel leader-epoch redis-holt-overflow 1
sentinel known-slave redis-holt-overflow 10.19.8.3 6381
sentinel current-epoch 1
Redis & Sentinel Info:
redis_version:2.8.21 redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:551c16ab9d912477
redis_mode:standalone
os:Linux 3.10.0-123.8.1.el7.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.7.2
process_id:13
run_id:7e1a1b6c844a969424d16f3efa116707ea7a60bf
tcp_port:6380
uptime_in_seconds:1312
uptime_in_days:0
hz:10
lru_clock:9642428
config_file:/usr/local/etc/redis/redis.conf