instance latency in Google App Engine

Question

I am running a free application and using 1 max idle instance using GAE's Python runtime.

According to http://code.google.com/appengine/docs/adminconsole/instances.html,

Your application's latency has the biggest impact on the number of instances needed to serve your traffic. If you service requests quickly, a single instance can handle a lot of requests.

This seems to suggest that adjusting the slider in 'Application Settings' to minimum latency would be best.

However, according to http://code.google.com/appengine/docs/adminconsole/performancesettings.html#Setting_the_Minimum_Pending_Latency,

it seems like having a high latency is good for preventing load spikes from spinning up new instances.

So is latency basically a tradeoff between ability to respond to request spikes (high latency) vs. number of requests handled over a given time period (low latency)?

By latency, the docs mean your code's latency to respond to requests. Adjusting the slider has nothing to do with that. If you experience high latency but your code is fast, it means AppEngine needed to start a new instance of your app. That might be because you just uploaded it, or nobody used it for a long time and the idle instance has been shut down, or an other running instance is stuck. — ᆼᆺᆼ

Dan Sanderson Dan Sanderson · Accepted Answer · 2012-01-28T21:40:38

"Pending latency" refers to how long a request can be sitting in the queue before App Engine decides to spin up another instance. If all of your app instances are busy when a request arrives, the request will wait in a queue to be handled by the next available instance. If it's there beyond the minimum, App Engine may decide to start up a new instance to handle the request. (There's also a maximum pending latency setting you can adjust.)

The minimum pending latency is configurable because starting up a new instance takes time and costs money. A larger minimum pending latency means App Engine will hold onto pending requests longer (and make them wait) before starting new instances, favoring instance cost to the ability to handle more traffic. A smaller minimum pending latency means App Engine will start new instances more often, as traffic picks up.

The term "latency" simply refers to how long it takes for your app to respond to a request. The faster your app can respond to requests, the more requests a single instance can handle, and the shorter the request queue will typically be. Lower latency is always good, but it's up to the app to do what it needs to do quickly.

instance latency in Google App Engine

1 Answers