0
votes

This is the current setup:

One redis server in intranet and 2 Node js servers each with 8 PM2 instances running in DMZ. There is a Firewall between the node servers and the redis server. We want to know why PM2 keeps restarting every hour or so approximately, and why connections in redis server keep increasing. We looked at pm2 logs and some of the errors we found are:

PM2 errors

This is redis conf:

# General
daemonize yes
pidfile "/var/run/redis/6379.pid"
dir "/apps/redis/6379"
port 6379
bind 0.0.0.0
timeout 0
tcp-keepalive 0
tcp-backlog 511
loglevel notice
logfile "/apps/log/redis/reids.log"
syslog-enabled yes
syslog-ident "redis_6379"
syslog-facility user
databases 16

# Snapshotting
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"

# Replication
slave-serve-stale-data yes
slave-read-only yes
repl-disable-tcp-nodelay no
slave-priority 100
min-slaves-max-lag 10
# Security
# Limits
maxclients 10000
maxmemory-policy noeviction

# Append Only Mode
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# Lua
lua-time-limit 5000

# Slow Log
slowlog-log-slower-than 10000
slowlog-max-len 128

# Event Notification
notify-keyspace-events ""

# Advanced
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes
# Generated by CONFIG REWRITE

This is the erratic behaviour observed in pm2 and netstat:

...
...
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:hacl-gs    ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:13039      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:23784      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:32191      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:18506      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:30892      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:26435      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:29383      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:28424      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:18174      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:3364       ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:28449      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:20975      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:23668      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:21041      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:23206      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:14224      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:29325      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:28642      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:aap        ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:14575      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:11651      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:5982       ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:10445      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:22765      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:12192      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:8926       ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:32150      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:10471      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:29948      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:25831      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:28012      ESTABLISHED
tcp        0      0 vidmarredispdb4.te:6379 10.54.22.226:5119       ESTABLISHED
[apps@vidmarredispdb4 redis]$ netstat | grep 6379 | grep 226 | wc -l
168
[apps@vidmarredispdb4 redis]$ netstat | grep 6379 | grep 227 | wc -l
40

pm2 restarts and times are erratic:


[apps@vidmarnjspapp1 ~]$ pm2 list
┌─────────────┬────┬─────────┬───────┬────────┬─────────┬────────┬─────┬───────────┬──────────┐
│ App name    │ id │ mode    │ pid   │ status │ restart │ uptime │ cpu │ mem       │ watching │
├─────────────┼────┼─────────┼───────┼────────┼─────────┼────────┼─────┼───────────┼──────────┤
│ Pm2Http9615 │ 0  │ fork    │ 23300 │ online │ 1       │ 39h    │ 0%  │ 25.1 MB   │ disabled │
│ video-api   │ 1  │ cluster │ 13434 │ online │ 3       │ 2h     │ 0%  │ 67.7 MB   │ disabled │
│ video-api   │ 2  │ cluster │ 23306 │ online │ 1       │ 39h    │ 0%  │ 77.8 MB   │ disabled │
│ video-api   │ 3  │ cluster │ 23312 │ online │ 1       │ 39h    │ 0%  │ 71.5 MB   │ disabled │
│ video-api   │ 4  │ cluster │ 6073  │ online │ 4       │ 20h    │ 0%  │ 68.1 MB   │ disabled │
│ video-api   │ 5  │ cluster │ 18850 │ online │ 2       │ 24h    │ 0%  │ 74.8 MB   │ disabled │
│ video-api   │ 6  │ cluster │ 23342 │ online │ 1       │ 39h    │ 0%  │ 74.2 MB   │ disabled │
│ video-api   │ 7  │ cluster │ 27569 │ online │ 2       │ 28h    │ 0%  │ 71.5 MB   │ disabled │
│ video-api   │ 8  │ cluster │ 23391 │ online │ 1       │ 39h    │ 0%  │ 78.2 MB   │ disabled │
└─────────────┴────┴─────────┴───────┴────────┴─────────┴────────┴─────┴───────────┴──────────┘

Here are tcpdump logs:

TCP Dump

NODE version - 6.11.2 REDIS version - 3.2.9 PM2 version - 2.4.6

Any help to understand this erratic behaviour regarding the number of connections would be highly appreciated.

1
Just updating with some of the errors we are getting on node server logs for redis connection: Error while connecting to structures DB 180102/072103.855, [log,auth,error] data: {"screen":"browse","error":{"code":"UNCERTAIN_STATE","command":"SELECT","args":[0],"origin":{"code":"ET IMEDOUT","errno":"ETIMEDOUT","syscall":"read"}} - Aritra Ghosh
[log,connection,client,error] message: read ECONNRESET, stack: Error: read ECONNRESET at exports._errnoException (util.js:1020:11) at TCP.onread (net.js:568:26) 171220/111905.037, [log,connection,client,error] message: read ECONNRESET, stack: Error: read ECONNRESET at exports._errnoException (util.js:1020:11) at TCP.onread (net.js:568:26) - Aritra Ghosh
Are these an indication that there is a problem with the server connectivity or some issue in the node JS code perhaps?? - Aritra Ghosh

1 Answers

0
votes

Possible Bugs:

  1. No error handling on for node process.

process.on('uncaughtException', function(e) {
  console.log('Uncaught Exception...');
  console.log(e.stack);
// disconnect redis or other connection
  process.exit(99);
});
  1. Close all (db, redis etc )connections on error.
  2. Open connection in global space, not in callbacks of request.

Ensure that you create redis connection once only and use it in every request. Close the redis connection when node process encounters any error.

Let me know how you are creating redis connection in your codebase.

Also, there might be memory leaks due to this because as the pm2 list says after each passing hours it using more memory example -
first 2 hours │ 2h │ 0% │ 67.7 MB

after 39 hours │ 39h │ 0% │ 78.2 MB