I've moved my server from apache2+fcgi to nginx+fpm because I wanted a lighter environment, and apache's memory footprint was high. The server is a dual core (I know, not very much) with 8G of ram. It runs also a rather busy FreeRadius server and related MySQL. CPU load average is ~1, with some obvious peaks.
One of those peaks happens every 30 minutes when I get web pings from some controlled devices. With Apache the server load was spiking up a lot, slowing down everything. Now with nginx the process is much faster (I also did some optimization in the code), tough now I miss some of these connections. I configured both nginx and fpm to what I believe should be enough, but I must be missing something because in these moments php isn't (apparently) able to reply to nginx. This is a recap of the config:
nginx/1.8.1
user www-data;
worker_processes auto;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
# multi_accept on;
}
client_body_buffer_size 10K;
client_header_buffer_size 1k;
client_max_body_size 20m;
large_client_header_buffers 2 1k;
location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(.*)$;
set $fsn /$yii_bootstrap;
if (-f $document_root$fastcgi_script_name){
set $fsn $fastcgi_script_name;
}
fastcgi_pass 127.0.0.1:9011;
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fsn;
fastcgi_param PATH_INFO $fastcgi_path_info;
fastcgi_param PATH_TRANSLATED $document_root$fsn;
fastcgi_read_timeout 150s;
}
php5-fpm 5.4.45-1~dotdeb+6.1
[pool01]
listen = 127.0.0.1:9011
listen.allowed_clients = 127.0.0.1
pm = dynamic
pm.max_children = 150
pm.start_servers = 2
pm.min_spare_servers = 2
pm.max_spare_servers = 8
pm.max_requests = 2000
pm.process_idle_timeout = 10s
When the peak arrives I start seeing this in fpm logs:
[18-Feb-2016 11:30:04] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 c
hildren, there are 0 idle, and 13 total children
[18-Feb-2016 11:30:05] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16
children, there are 0 idle, and 15 total children
[18-Feb-2016 11:30:06] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32
children, there are 0 idle, and 17 total children
[18-Feb-2016 11:30:07] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32
children, there are 0 idle, and 19 total children
and worse in nginx's error.log
2016/02/18 11:30:22 [error] 23400#23400: *209920 connect() failed (110: Connection timed out) while connecting to upstream, client: 79.1.1.9,
server: host.domain.com, request: "GET /ping/?whoami=abc02 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209923 connect() failed (110: Connection timed out) while connecting to upstream, client: 1.1.9.71,
server: host.domain.com, request: "GET /utilz/pingme.php?whoami=abc01 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209925 connect() failed (110: Connection timed out) while connecting to upstream, client: 3.7.0.4,
server: host.domain.com, request: "GET /ping/?whoami=abc03 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209926 connect() failed (110: Connection timed out) while connecting to upstream, client: 1.7.2.1
, server: host.domain.com, request: "GET /ping/?whoami=abc04 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
Those connections are lost!
First question, why nginx returns timeout within 22s (the pings are made at 00 and 30 minutes of every hour) if fastcgi_read_timeout is set to 150s?
Second question: why do I get so many fpm warnings? The total children displayed never reaches pm.max_children. I know warnings are not errors, but I get warned... Is there a relation between those messages and nginx's timeouts?
Given that the server handles perfectly fine the regular traffic, and it has no problem with ram and swap neither during these peak times (it always has ~1.5G or more free), is there a better tuning to handle those ping connections (which doesn't involve changing the schedule)? Should I raise pm.start_servers and/or pm.min_spare_servers?
fastcgi_connect_timeoutcould be the interesting config! - Maxxerfastcgi_connect_timeoutinstead offastcgi_read_timeout. The latest only applies to when php replies and takes too long to complete, while the first is obviously to start the connection. - peixotorms