Issues with nginx limit_req rate limiting - docs clarification?

Question

I am having no end of trouble getting rate limiting to work on nginx with passenger/rails.

Part of the confusion comes with distinguishing between which aspects of the config work on a per-client basis and which are global limits.

I'm having issues getting my head around the ideal setup for nginx's limit_req and limit_req_zone configs. It seems to vaguely flip flop between language which hints that this is either user-specific or applies globally.

In the docs it is quite vague exactly how the limit_req_zone line works. Is this 'zone' global or per-user? Given the following line am I right in the following conclusions:

limit_req_zone $binary_remote_addr zone=update_requests:1m rate=20r/s;

$binary_remote_addr represents a user's IP address
This representation in particular is preferable because it takes up less space than $remote_addr? Why is this important or preferable?
The 'zone' (in this case) is filled up with representations of their IP address...?
'rate' is the rate at which requests are allowed to leave the queue?
This 'rate' and 'zone' - are they client-specific or global?

I'm also unsure about the limit_req line, e.g. for this:

limit_req zone=main_site burst=10 nodelay;

Not entirely sure what burst means. The docs are vague here too. I guess this is a number of requests. Why number of requests, when the rest of the requests system uses this bizarre 'zone' system?
'burst' requests are per....what timeframe?
'nodelay', as far as I understand, is meant to serve a 503 error immediately if they have other requests in the queue, rather than waiting for the queue to finish. a) wait how long? b) does this mean that the 'burst' setting is ignored in this case?

Thanks.

Some background info in case anyone is really bored and wants to have a look at the config and general issues we're trying to resolve:

At the moment I have this (extract):

limit_req_zone $binary_remote_addr zone=main_site:10m rate=40r/s;
limit_req_zone $binary_remote_addr zone=update_requests:1m rate=20r/s;

server {
  listen        80;
  server_name   [removed];
  root          [removed];
  include       rtmp_proxy_settings;

  try_files $uri /system/maintenance.html @passenger;
  location @passenger {
    passenger_max_request_queue_size 0; # 256;
    limit_rate_after 2048k;
    limit_rate 512k;
    limit_req zone=main_site burst=10 nodelay;
    limit_conn addr 5;
    passenger_enabled on;
    passenger_min_instances 3;
  }

  location ~ ^/update_request {
    passenger_enabled on;
    limit_req zone=update_requests burst=5 nodelay;
  }


  gzip on;
  gzip_min_length 1000;
  gzip_proxied expired no-cache no-store private auth;
  gzip_types text/plain application/xml application/javascript text/javascript text/css;
  gzip_disable "msie6";
  gzip_http_version 1.1;
}

We have two zones defined:

a) "main_site", designed to catch everything b) "update_request", JS on the client polls this via AJAX for updated content when a timestamp in a small (cached) file changes

By its nature this tends to mean that we have fairly low traffic for 1 or 2 minutes but then a massive spike when potentially 10,000 clients all hit the server at once for this updated content (served from the DB in a slightly different way depending on filters, access permissions, etc)

We were finding that during times of heavy load the site was grinding to a halt when the CPU cores were maxed out - we had a few bugs in our updating code which meant that when the connection was dropped the queries queued up and just kept bogging the server down until we had to take the site down temporarily and force users to logout and refresh their browser... effectively we DDoS'd ourselves :P I think this was originally caused by some connectivity issues on our hosting company's side causing a bunch of requests to queue up in the user's browser.

While we ironed out the bugs we warned clients that they might receive the odd 503 "heavy load" message or see the content not updating in a timely fashion. The original intent of the rate limiting was to ensure that the everyday pages of the site could continue to be navigated around even during heavy load, while rate limiting the updated content.

However the main issue we are seeing now is that even after the bugs in the updating code have been (hopefully) ironed out, we can't quite strike a good balance on the rate limiting. Everything we set seems to generate an unhealthy number of 503 errors in the access logs whenever a new piece of content is added to the site (and pulled by our users all at once)

We are looking at various solutions here in terms of caching but ideally we would still like to be protected by some kind of rate limiting which doesn't affect users during day to day operations.

Clinton Blackburn Clinton Blackburn · Accepted Answer · 2014-10-23T13:01:06

Which docs are you reading? http://nginx.org/en/docs/http/ngx_http_limit_req_module.html is pretty clear regarding the usage and syntax of the directives.

Regarding limit_req_zone:

Yes.
In your example, you are allocating 1MB of space to store the list of "current number of excessive requests". The less space each item/key uses, the more you can store. "If the zone storage is exhausted, the server will return the 503 (Service Temporarily Unavailable) error to all further requests."
You need to keep track of which clients should be rate-limited.
Rate is the maximum number of requests a client can make in a specified period of time.
The context for limit_req_zone is limited to http, making it global.

Regarding limit_req:

Once a client has reached the rate limit, the client can continue making requests; however, the server will delay processing (in an attempt to slow down the client). If the client continues to make requests above the rate limit, and sends at least burst number of requests, the server will simply drop all requests (instead of slowing down). One might use this in an effort to fend of DoS attacks or API abuse.
Burst requests are not time-dependent. Burst only kicks in if the client is over the rate limit.
nodelay removes the delay for processing requests over the burst value. If you want no processing for any rate-limited clients, set burst to 0 and use nodelay. The wait/delay for rate-limited clients depends on the rate limit specified by limit_req_zone.

Issues with nginx limit_req rate limiting - docs clarification?

1 Answers