429 at very low qps despite adequate headroom

Question

Xoogler in the cloud here. I have a very low qps service that serves HTML plus the follow-up resources. So it typically sits idle and then receives something in the order of 20 requests over 5s with concurrency well below 10, where concurrency limit is 80. I observe that clients regularly receive 429s from Cloud Run, typically after periods of service inactivity, even though an instance is still up (so it's not a cold-start problem). This can either be on the first request but often somewhere in the middle of the sequence (i.e. icons, css don't load).

The instance is concurrent, responsive and could easily handle the load, but Cloud Run doesn't let it. No other instances are spun up either, although we're not even at the max of 2. This suggests that Cloud Run for some reason estimates >2 instances needed?

Here's a typical request sequence, redacted from the logs:

... 20 min idle ...
I 2020-03-27T18:21:27.619317Z GET 307 288 B 5 ms
I 2020-03-27T18:21:27.706580Z GET 302 0 B 0 ms
I 2020-03-27T18:21:27.760271Z GET 200 5.83 KiB 5 ms
I 2020-03-27T18:21:27.838066Z GET 200 1.89 KiB 4 ms
I 2020-03-27T18:21:27.882751Z GET 200 1.05 KiB 4 ms
I 2020-03-27T18:21:27.886743Z GET 200 582 B 3 ms
I 2020-03-27T18:21:27.893060Z GET 200 533 B 4 ms
I 2020-03-27T18:21:27.897352Z GET 200 5.35 KiB 4 ms
I 2020-03-27T18:21:27.899086Z GET 200 11.38 KiB 6 ms
I 2020-03-27T18:21:27.905967Z GET 200 22.48 KiB 13 ms
I 2020-03-27T18:21:27.906113Z GET 200 592 B 13 ms
I 2020-03-27T18:21:27.907967Z GET 200 35.08 KiB 14 ms
...500ms...
I 2020-03-27T18:21:28.434846Z GET 200 2.76 MiB 50 ms
I 2020-03-27T18:21:28.465552Z GET 200 2.29 MiB 67 ms <= up to here all resources served from image
...2500ms...
I 2020-03-27T18:21:31.086943Z GET 200 2.95 KiB 706 ms <= IO-bound, talking to backend api
...1600ms...
W 2020-03-27T18:21:32.674973Z GET 429 14 B 0 ms   <= !!!
W 2020-03-27T18:21:32.675864Z GET 429 14 B 0 ms   <= !!!
W 2020-03-27T18:21:32.676292Z GET 429 14 B 0 ms   <= !!!
I 2020-03-27T18:21:32.684265Z GET 200 547 B 6 ms
I 2020-03-27T18:21:32.686695Z GET 200 504 B 9 ms
I 2020-03-27T18:21:32.690580Z GET 200 486 B 12 ms

Conceivably that last group of requests are 6 parallel requests. Why would three be denied and three served? The service is way under capacity. A couple of reloads typically solve the issue.

It really appears to me as if the algorithm vastly overestimates the required resources after a period of inactivity. I'm happy to try a larger max-instances (redeployed to 10 now) but something really seems off with the estimates on the low end of the spectrum. If "2" as a concurrency setting is below what the platform supports, gcloud probably should probably enforce a higher minimum in the first place.

This is somewhat sad as it impacts people just "trying out" Cloud Run and they observe intermittent errors (partially rendered pages, ...) - which are even pinned on the client (4xx) who is certainly not at fault.

Happy to provide more data.

Configuration:

template:
    metadata:
...
      annotations:
...
        autoscaling.knative.dev/maxScale: '2'
    spec:
      timeoutSeconds: 900
...
      containerConcurrency: 80
      containers:
...
        resources:
          limits:
            cpu: 1000m
            memory: 244Mi

What does Cloud Monitoring metric for Cloud Run: CPU utilization show? — John Hanley
Thanks for getting back. CPU utilization @ 99p around that time is 20% max. I can play with memory. Right now I've changed the max-instances to 10 and will observe a little longer, so far no more 429s (but also very little usage). — mernst

Emil Gi Emil Gi · Accepted Answer · 2020-03-31T10:10:21

This looks like a known issue with Cloud Run, I would recommend starring it to receive notifications and expedite resolution.

429 at very low qps despite adequate headroom

1 Answers