3
votes

I have a low-load application which experienced latency spikes (requests taking up to 10s to return) due to loading requests, as seen in the logs:

This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time.

Here I assume that "new process" means "new instance".

In order to avoid this, I fixed the number of idle instances to exactly one (max=1 and min=1), so there is always one instance running ("resident instance") and GAE shouldn't start new ones. Billing is enabled.

However, I still experience loading requests. Why? Can anything be done about this?

2
Do you have billing enabled? - Peter Knego
Yes, I have billing enabled. - paul
Is your instance shutting down? My very low-load app often has its single instance shut down by the Scheduler -- often only minutes after it spun up. I started a thread about this quite some time ago in GAE Google Groups. Was not alone. As usual, crickets from Google. Conclusion of the Groups thread was that low-qps apps will suffer unusual Scheduler behaviors. This implies Java is a complete "no go" for such apps due to its startup time. - stevep
Well, I always have one instance running (set number of idle instances min=1) and I have set the Pending latency to max (so requests should wait instead of trigger new instances). Even so, GAE decides to start new instances upon requests (Why?), which then take around 10s to initialize. - paul

2 Answers

2
votes

Idle instances are "reserve" instances - they are meant to handle spikes when traffic increases, not the "normal" traffic. Idle instances are used only during the spin-up of the dynamic instances.

So, when you have one idle instance and no dynamic instances running and you get a request, than the idle instance should handle the request, but a new dynamic instance will still be spun up.

0
votes

I too experienced the same problem with my low-traffic app and here is the practical solution that almost always prevents my users to face a cold start : - 1 resident F4 instance - pending latency to 15 sec - i worked so that my warmup request are as fast as possible (under 10 sec), still quite long cause i use the frameWork Play (Java) - and when i really don t want to have any problems i create fake traffic by pinging my app.

With this config, the resident usually serves around 50 requests, during that time, a dynamic instance receives a warmup and then start serving.