2
votes

I am load testing a t2.micro box that has nginx and postgrest running in docker containers. Nginx acts as a proxy in front of postgrest. If i go directly to the upstream (postgrest) i get a nice graph (peaks at about 900/rps) If i go through nginx, i get this kind of graph

enter image description here

The CPU is not maxed out (only about 50%)

This is the nginx config used. Everything that is commented has been tried with no impact. I also played with values of worker_connections and related things. What can this periodic drop be triggered by?

    worker_processes  2;

    #worker_rlimit_nofile              2048;
    events {
        #  multi_accept                    on;
        worker_connections              1024;
        use                             epoll;
    }
    http {
        resolver 127.0.0.11 ipv6=off;
        include mime.types;
        #tcp_nodelay                     off;
        #tcp_nopush                      on;
        upstream postgrest {
            server postgrest:3000;
            keepalive 64;
        }
        server {
            listen       80;
            server_name  localhost;
            charset utf-8;

            location /rest/ {
                default_type  application/json;
                #proxy_buffering off;
                proxy_pass http://postgrest/; # Reverse proxy to your PostgREST
            }
        }
    }
1
The graph for latency is also interesting, it stays at about 10ms up until the first drop, then is jumps to about 1s and mostly stays there, mirroring a bit the throughput graph, but it does not come back to 10ms. Also there are not errors, every request is 200 OKRuslan Talpa
a new development. If i make the containers use the host network, the performance is worse, the drop goes to almost 0. Any one know how to compare the network settings for host vs bridge mode to see what parameters are different?Ruslan Talpa

1 Answers

2
votes

The culprit was the (default) kernel tcp settings. When going through nginx proxy the system was using up all the local ports then everything stopped (the drop) until the old tcp connections can be closed completely (they were in time_wait for 60s) Tuning these settings removed the problem

#tcp settings
net.core.somaxconn
net.ipv4.tcp_fin_timeout
net.ipv4.tcp_tw_reuse
net.ipv4.ip_local_port_range

#nginx configs
proxy_set_header  Connection "";
proxy_http_version 1.1;

The article below goes into more detail about what exactly was going on, the parameters to tune.

https://engineering.gosquared.com/optimising-nginx-node-js-and-networking-for-heavy-workloads