It's my understanding that billing on GAE all boils down to instance-hours ("IH"), or how many server instances are running for some duration of time. However, it is obviously not that simple, because in addition to IH you quotas and resource limits that you must be leary of throughout the course of the day (since quotas replenish every 24 hours).
I am in the process of designing my first GWT/GAE app, and have come across many articles (some of which are cited below) where the authors talk about major refactorings they had to make to their code - post release - in order to help minimize billing and operational costs with Google.
In one instance, a developer implemented a set of optimizations to his GAE app which caused the same app to go from $7/day (~$220/month) down to $0 because it was finally under the "free" quotas and billing thresholds for resources.
Being so new to GAE, I'm wondering if there are any set of principles or practices I can incorporate into the architecture/design of my app upfront, that once trickled down into implemented, functional code and deployed to GAE, will cause the app to run as efficiently (monetarily-speaking) as possible.
Here are some deductions I've made so far:
- Maximize caching and minimize datastore hits
- Try to push as many asynchronous request handling to backend instances as possible
- Enable concurrent HTTP request handling so that the same instance can handle multiple requests at the same time
So my question: are any of these generalizations I've made incorrect, and if so, why (or are they conditional, where they hold true in some cases but not in others)? Am I missing anything critical here? For instance, how to determine what code belongs on a backend instance (where resource constraints are little more lax), making use of what kinds of GAE-specific profiling tools (AppStats, SpeedTracer, etc.) to see bottlenecks, etc.
Also, some cited articles:
- Configuring Max Idle and Minimum Latency
- GAE's own scaling best practices
- An example of CPU optimization