0
votes

I've moved my server from apache2+fcgi to nginx+fpm because I wanted a lighter environment, and apache's memory footprint was high. The server is a dual core (I know, not very much) with 8G of ram. It runs also a rather busy FreeRadius server and related MySQL. CPU load average is ~1, with some obvious peaks.

One of those peaks happens every 30 minutes when I get web pings from some controlled devices. With Apache the server load was spiking up a lot, slowing down everything. Now with nginx the process is much faster (I also did some optimization in the code), tough now I miss some of these connections. I configured both nginx and fpm to what I believe should be enough, but I must be missing something because in these moments php isn't (apparently) able to reply to nginx. This is a recap of the config:

nginx/1.8.1

user www-data;
worker_processes auto;
pid /var/run/nginx.pid;

events {
        worker_connections  1024;
        # multi_accept on;
}

client_body_buffer_size 10K;
client_header_buffer_size 1k; 
client_max_body_size 20m;
large_client_header_buffers 2 1k; 
location ~ \.php$ {
  fastcgi_split_path_info  ^(.+\.php)(.*)$;
  set $fsn /$yii_bootstrap;
  if (-f $document_root$fastcgi_script_name){
    set $fsn $fastcgi_script_name;
  }

  fastcgi_pass 127.0.0.1:9011;
  include fastcgi_params;
  fastcgi_param  SCRIPT_FILENAME  $document_root$fsn;
  fastcgi_param  PATH_INFO        $fastcgi_path_info;
  fastcgi_param  PATH_TRANSLATED  $document_root$fsn;
  fastcgi_read_timeout 150s;
}

php5-fpm 5.4.45-1~dotdeb+6.1

[pool01]
listen = 127.0.0.1:9011
listen.allowed_clients = 127.0.0.1
pm = dynamic
pm.max_children = 150 
pm.start_servers = 2 
pm.min_spare_servers = 2 
pm.max_spare_servers = 8 
pm.max_requests = 2000
pm.process_idle_timeout = 10s 

When the peak arrives I start seeing this in fpm logs:

[18-Feb-2016 11:30:04] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 c
hildren, there are 0 idle, and 13 total children
[18-Feb-2016 11:30:05] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 
children, there are 0 idle, and 15 total children
[18-Feb-2016 11:30:06] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 
children, there are 0 idle, and 17 total children
[18-Feb-2016 11:30:07] WARNING: [pool pool01] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 
children, there are 0 idle, and 19 total children

and worse in nginx's error.log

2016/02/18 11:30:22 [error] 23400#23400: *209920 connect() failed (110: Connection timed out) while connecting to upstream, client: 79.1.1.9, 
server: host.domain.com, request: "GET /ping/?whoami=abc02 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209923 connect() failed (110: Connection timed out) while connecting to upstream, client: 1.1.9.71, 
server: host.domain.com, request: "GET /utilz/pingme.php?whoami=abc01 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209925 connect() failed (110: Connection timed out) while connecting to upstream, client: 3.7.0.4,
 server: host.domain.com, request: "GET /ping/?whoami=abc03 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"
2016/02/18 11:30:22 [error] 23400#23400: *209926 connect() failed (110: Connection timed out) while connecting to upstream, client: 1.7.2.1
, server: host.domain.com, request: "GET /ping/?whoami=abc04 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9011", host: "host.domain.com"

Those connections are lost!

First question, why nginx returns timeout within 22s (the pings are made at 00 and 30 minutes of every hour) if fastcgi_read_timeout is set to 150s?

Second question: why do I get so many fpm warnings? The total children displayed never reaches pm.max_children. I know warnings are not errors, but I get warned... Is there a relation between those messages and nginx's timeouts?

Given that the server handles perfectly fine the regular traffic, and it has no problem with ram and swap neither during these peak times (it always has ~1.5G or more free), is there a better tuning to handle those ping connections (which doesn't involve changing the schedule)? Should I raise pm.start_servers and/or pm.min_spare_servers?

1
Connection to upstream will timeout if php is not responding to nginx, even with higher timeout on fpm. This timeout is decided by nginx, not the upstream provider (because it could be down). - peixotorms
According to the docs fastcgi_read_timeout is the time nginx waits for the upstream server... - Maxxer
but fastcgi_connect_timeout could be the interesting config! - Maxxer
Sorry, I was thinking offastcgi_connect_timeout instead of fastcgi_read_timeout. The latest only applies to when php replies and takes too long to complete, while the first is obviously to start the connection. - peixotorms
It looks like you're executing fast_cgi even for static files, so php might be overloaded. I didn't see any location blocks, so I'm not sure. - peixotorms

1 Answers

0
votes

You need some changes + I would recommend upgrading your php to 5.6.

Nginx tunning: /etc/nginx/nginx.conf

user www-data;
pid /var/run/nginx.pid;
error_log /var/log/nginx/error.log crit;

# NOTE: Max simultaneous requests =    worker_processes*worker_connections/(keepalive_timeout*2)
worker_processes 1;
worker_rlimit_nofile 750000;

# handles connection stuff
events { 
worker_connections 50000;
multi_accept on;
use epoll;
}


# http request stuff
http {
access_log off;
log_format  main  '$remote_addr $host $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $ssl_cipher $request_time';

types_hash_max_size 2048;
server_tokens off;
fastcgi_read_timeout 180;
keepalive_timeout 20;
keepalive_requests 1000;
reset_timedout_connection on;
client_body_timeout 20;
client_header_timeout 10;
send_timeout 10;
tcp_nodelay on;
tcp_nopush on;
sendfile on; 
directio 100m;
client_max_body_size 100m;
server_names_hash_bucket_size 100;
include /etc/nginx/mime.types;
default_type application/octet-stream;

# index default files
index              index.html index.htm index.php;

# use hhvm with php-fpm as backup
upstream php {
        keepalive 30;
        server 127.0.0.1:9001; # php5-fpm (check your port)
}

# Virtual Host Configs
include /etc/nginx/sites-available/*;

}

For a default server, create and add to /etc/nginx/sites-available/default.conf

# default virtual host
server {
listen   80;
server_name localhost;
root /path/to/your/files;
access_log        off;
log_not_found     off;

# handle staic files first
location / {
index index.html index.htm index.php ;
}

# serve static content directly by nginx without logs
location ~* \.(jpg|jpeg|gif|png|bmp|css|js|ico|txt|pdf|swf|flv|mp4|mp3)$ {
access_log off;
log_not_found off;
expires   7d;

# Enable gzip for some static content only
gzip on;
gzip_comp_level 6;
gzip_vary on;
gzip_types text/plain text/css application/json application/x-javascript application/javascript text/javascript image/svg+xml application/vnd.ms-fontobject application/x-font-ttf font/opentype;
}

# no cache for xml files
location ~* \.(xml)$ {
access_log off;
log_not_found off;
expires   0s;
add_header Pragma no-cache;
add_header Cache-Control "no-cache, no-store, must-revalidate, post-check=0, pre-check=0";
gzip on;
gzip_comp_level 6;
gzip_vary on;
gzip_types text/plain text/xml application/xml application/rss+xml;
}


# run php only when needed
location ~ .php$ {
# basic php params
fastcgi_pass php; 
fastcgi_index   index.php;
fastcgi_keep_conn on;
fastcgi_connect_timeout 20s; 
fastcgi_send_timeout 30s; 
fastcgi_read_timeout 30s;

# fast cgi params
include fastcgi_params;
fastcgi_param  SCRIPT_FILENAME    $document_root$fastcgi_script_name;
fastcgi_param  SCRIPT_NAME        $fastcgi_script_name;
fastcgi_param  QUERY_STRING       $query_string;
fastcgi_param  REQUEST_METHOD     $request_method;
fastcgi_param  CONTENT_TYPE       $content_type;
fastcgi_param  CONTENT_LENGTH     $content_length;
fastcgi_param  REQUEST_URI        $request_uri;
fastcgi_param  DOCUMENT_URI       $document_uri;
fastcgi_param  DOCUMENT_ROOT      $document_root;
fastcgi_param  REMOTE_ADDR        $remote_addr;
fastcgi_param  REMOTE_PORT        $remote_port;
}

}

Idealy, you want php5-fpm to autorestart if it starts failing, therefore you can this to /etc/php5/fpm/php-fpm.conf

emergency_restart_threshold = 60
emergency_restart_interval = 1m
process_control_timeout = 10s

Change /etc/php5/fpm/pool.d/www.conf

[www]
user = www-data
listen.owner = www-data
listen.group = www-data
listen.mode = 0660
listen = 127.0.0.1:9001
listen.allowed_clients = 127.0.0.1
listen.backlog = 65000
pm = dynamic
pm.start_servers = 8
pm.min_spare_servers = 4
pm.max_spare_servers = 16

; maxnumber of simultaneous requests that will be served (if each php page needs 32 Mb, then 128x32 = 4G RAM)
pm.max_children = 128

; We want to keep it hight (10k to 50k) to prevent server respawn, however if there are memory leak on PHP code we will have a problem.
pm.max_requests = 10000