Hello Bobby_H and welcome to Stack Overflow!
When using Nginx Ingress on Kubernetes you can set up your rate limits with these annotations:
nginx.ingress.kubernetes.io/limit-connections:
number of concurrent connections allowed from a single IP address. A 503 error
is returned when exceeding this limit.
nginx.ingress.kubernetes.io/limit-rps:
number of requests accepted from a given IP each second. The burst limit is set to this limit
multiplied by the burst multiplier, the default multiplier is 5. When
clients exceed this limit, limit-req-status-code default: 503 is
returned.
nginx.ingress.kubernetes.io/limit-rpm:
number of requests accepted from a given IP each minute. The burst limit is set to this limit
multiplied by the burst multiplier, the default multiplier is 5. When
clients exceed this limit, limit-req-status-code default: 503 is
returned.
nginx.ingress.kubernetes.io/limit-burst-multiplier:
multiplier of the limit rate for burst size. The default burst multiplier is 5, this
annotation override the default multiplier. When clients exceed this
limit, limit-req-status-code default: 503 is returned.
nginx.ingress.kubernetes.io/limit-rate-after:
initial number of kilobytes after which the further transmission of a response to a
given connection will be rate limited. This feature must be used with
proxy-buffering enabled.
nginx.ingress.kubernetes.io/limit-rate:
number of kilobytes per second allowed to send to a given connection. The zero value disables
rate limiting. This feature must be used with proxy-buffering enabled.
nginx.ingress.kubernetes.io/limit-whitelist:
client IP source ranges to be excluded from rate-limiting. The value is a comma
separated list of CIDRs.
Nginx implements the leaky bucket algorithm, where incoming requests are buffered in a FIFO queue, and then consumed at a limited rate. The burst value defines the size of the queue, which allows an exceeding number of requests to be served beyond the base limit. When the queue becomes full, the following requests will be rejected with an error code returned.
Here you will find all important parameters to configure your rate limiting.
The number of expected successful requests can be calculated like this:
successful requests = (period * rate + burst) * nginx replica
so it is important to notice that the number of nginx replicas will also multiply the number of successful requests. Also, notice that Nginx ingress controller sets burst value at 5 times the limit. You can check those parameters at nginx.conf
after setting up your desired annotations. For example:
limit_req_zone $limit_cmRfaW5ncmVzcy1yZC1oZWxsby1sZWdhY3k zone=ingress-hello-world_rps:5m rate=5r/s;
limit_req zone=ingress-hello-world_rps burst=25 nodelay;
limit_req_zone $limit_cmRfaW5ncmVzcy1yZC1oZWxsby1sZWdhY3k zone=ingress-hello-world_rpm:5m rate=300r/m;
limit_req zone=ingress-hello-world_rpm burst=1500 nodelay;
There are two limitations that I would also like to underline:
Requests are counted by client IP, which might not be accurate, or not fit your business needs such as rate-limiting by user identity.
Options like burst and delay are not configurable.
I strongly recommend to go through the below sources also to have a more in-depth explanation regarding this topic: