8
votes

I have a rails app running which is using redis quite a lot - however - I'm seeing quite a few Redis::TimeoutError occurring here and there, from time to time. There is no pattern in the circumstances. It occurs both in the web app and in the background jobs (which is being processed using sidekiq) - not often but from time to time.

Now I have no idea how to track down the root cause of this and hence no idea how to fix it.

Here is a little background on my setup:

The redis instance is running on a separate physical server which is connected to both my web server and background server in a private local 1Gbit network. All servers are running ubuntu 12.04. The redis version is 2.6.10. I'm connecting from my rails app (which is 3.2) using an initializer like so:

require 'redis'
require 'redis/objects'
REDIS = Redis.new(:url => APP_CONFIG['REDIS_URL'])
Redis.current = REDIS

This is the output of redis-cli INFO:

# Server
redis_version:2.6.10
redis_git_sha1:00000000
redis_git_dirty:0
redis_mode:standalone
os:Linux 3.2.0-38-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.6.3
process_id:28475
run_id:d89bbb1b81d3169c4228cf23c0988ae437d496a1
tcp_port:6379
uptime_in_seconds:14913365
uptime_in_days:172
lru_clock:1507056

# Clients
connected_clients:233
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:19

# Memory
used_memory:801637360
used_memory_human:764.50M
used_memory_rss:594706432
used_memory_peak:4295394784
used_memory_peak_human:4.00G
used_memory_lua:31744
mem_fragmentation_ratio:0.74
mem_allocator:jemalloc-3.3.0

# Persistence
loading:0
rdb_changes_since_last_save:23166
rdb_bgsave_in_progress:0
rdb_last_save_time:1378219310
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:4
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok

# Stats
total_connections_received:932395
total_commands_processed:3088408103
instantaneous_ops_per_sec:837
rejected_connections:0
expired_keys:31428
evicted_keys:3007
keyspace_hits:124093049
keyspace_misses:53060192
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:17651

# Replication
role:master
connected_slaves:1
slave0:192.168.0.2,6379,online

# CPU
used_cpu_sys:54000.21
used_cpu_user:73692.52
used_cpu_sys_children:36229.79
used_cpu_user_children:420655.84

# Keyspace
db0:keys=1498962,expires=1310

In my redis config I have the following set:

\fidaemonize yes
pidfile /var/run/redis/redis-server.pid
timeout 0
loglevel notice
databases 1
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/redis
slave-serve-stale-data yes
slave-read-only yes
slave-priority 100
maxclients 1000
maxmemory 4GB
maxmemory-policy volatile-lru
appendonly no
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60 
4
So do the timeout errors come in batches or sporadically. - Michael Papile
also what does redis-cli --latency yield as the max latency to your host when run for a while? - Michael Papile
@MichaelPapile I think they might be occurring in small batches - but not sure and not very clearly - if so - it's not at all like everything is failing at curtain points or anything - Niels Kristian
The latency call shows: min: 0, max: 2, avg: 0.26 (2682 samples) - Niels Kristian
Do you use Unicorn? If so it might be related to this: github.com/resque/resque/issues/1018 there is also this explanation github.com/redis/redis-rb/issues/340 not sure if these would help you though - j03w

4 Answers

7
votes

That could come from many issues :

  • because you use the SAVE command (it is setup in your conf) generating a lot of I/O and hammering the server, especially if you use EBS volumes on Amazon.
  • because you have a Redis slave (same as before, doing SAVE before mirroring).
  • because you use a KEY * which is very slow on a lot of indexes.
3
votes
  1. Try "slowlog" command on the redis server to see if there are some "slow query".
  2. Write some logs when "TimeoutError" happens, to see if the "error redis command" in the "slow log".
  3. adjust your timeout setting on the client side。
1
votes

It might be a problem on the client side if server performs normally. Each redis client instance, not the server, also has a timeout setting, and the default setting is very short - something like a few milliseconds. So if the server does not respond within that time, a Redis::TimeoutError will be raised by the client.

First thing you can try is to set a longer timeout value, and see if things get better.

redis_url = 'redis://user:password@host:port/'
redis = Redis.connect(:url => redis_url, :timeout => 0.7)

Even with longer timeout setting, there is no guarantee that timeout would not happen, but then it'd be a problem of the design of your system.

0
votes

Are you rolling your own code to connect to redis or just letting sidekiq handle it? I think you should really just design your connection code to reconnect if the connection has been lost. You can rescue Redis::BaseConnectionError and reconnect.