Optimizing Application Architecture and Implementation for Google App Engine

Question

It's my understanding that billing on GAE all boils down to instance-hours ("IH"), or how many server instances are running for some duration of time. However, it is obviously not that simple, because in addition to IH you quotas and resource limits that you must be leary of throughout the course of the day (since quotas replenish every 24 hours).

I am in the process of designing my first GWT/GAE app, and have come across many articles (some of which are cited below) where the authors talk about major refactorings they had to make to their code - post release - in order to help minimize billing and operational costs with Google.

In one instance, a developer implemented a set of optimizations to his GAE app which caused the same app to go from $7/day (~$220/month) down to $0 because it was finally under the "free" quotas and billing thresholds for resources.

Being so new to GAE, I'm wondering if there are any set of principles or practices I can incorporate into the architecture/design of my app upfront, that once trickled down into implemented, functional code and deployed to GAE, will cause the app to run as efficiently (monetarily-speaking) as possible.

Here are some deductions I've made so far:

Maximize caching and minimize datastore hits
Try to push as many asynchronous request handling to backend instances as possible
Enable concurrent HTTP request handling so that the same instance can handle multiple requests at the same time

So my question: are any of these generalizations I've made incorrect, and if so, why (or are they conditional, where they hold true in some cases but not in others)? Am I missing anything critical here? For instance, how to determine what code belongs on a backend instance (where resource constraints are little more lax), making use of what kinds of GAE-specific profiling tools (AppStats, SpeedTracer, etc.) to see bottlenecks, etc.

Also, some cited articles:

Configuring Max Idle and Minimum Latency
GAE's own scaling best practices
An example of CPU optimization

Ibrahim Arief Ibrahim Arief · Accepted Answer · 2012-08-23T06:22:39

Based on experience, there are a huge laundry list of strategies for App Engine optimization, the applicability of which depends on the nature of your apps. Here are some more tips that I know of:

For apps that serves a high amount of relatively static content, enabling the (as yet undocumented) edge caching could be a blessing to your weekly bills.
Even with concurrent requests/threadsafe enabled, each frontend instances could only process 8 (for Python) to 10 (Java, Go) simultaneous incoming request before the scheduler decides to spin up a new instance for you.
To counter the above restriction, I think there's a Google I/O video that recommends you to reduce the response time for any user-facing request going to the frontend instances to be ~100 ms.
To the tune of the above strategy, if you have any task that requires a large amount of processing or datastore I/O, offload the task to the push task queue, and immediately respond the request. You can specify the target of the task queue, but for this purpose it does not need to be the backend, frontend instances are good enough, and offer infinite scalability.
If you use the above strategy but still need to give the result of the processing or I/O to the user, use Channel API or any other messaging services to send the result back asynchronously.
Task queues are amazing stuff to distribute the workload of your app. You could customize its behavior in detail, and they are invaluable in making sure your app scales nicely. You can even have a two-way communication between instances using both push and pull queues.

_{I'll add more points later on.}

Optimizing Application Architecture and Implementation for Google App Engine

2 Answers