Python App Engine webapp2 slow to route

Question

I have a Python App Engine application which serves about 3M requests per day. I am trying to optimize the app to save on my ridiculously ballooned hosting bill.

November, App Engine Frontend Instances: 12924.391 Hours, $604.22

I have the request handling down to some memcached calls, but now I've noticed that it usually takes about 20ms, sometimes as long as 166ms before webapp2 even passes the request to me.

In the image below you can see a Trace showing "Post" happening at 166ms.

Here is the code serving this.

import logging
logging.info("main.py logging imported")
from imports import *
from handlers import *
logging.info("completed importing others")

class Main(webapp.RequestHandler):
    def post(self):
        logging.info("Post")
        self.get()


...

app = webapp.WSGIApplication(
    [
     ('/.*', Main)
    ],
    debug=False
)

What have I tried?

I have threadsafe enabled, so any imports should not be happening before the request is served. Just to be sure, I also added logging to see when the imports happen and as you can see they are not done for every request.

More information

It is not important that the latency is low, except to save on the hosting bill. I would be fine even with a 1 minute response time (the requests are an API webhook), as long as it wouldn't count for front-end instance time!

In case it is relevant, here is the beginning of my app.yaml.

application: coolestsports-hrd
version: 1
runtime: python27
threadsafe: yes
api_version: 1
automatic_scaling:
  min_idle_instances: 1
  min_pending_latency: 1000ms

I'm not sure what your url scheme looks like but, if possible, it may help to remove the wildcard chars from the url matching — B Rad C
@BRadC I added direct route for the url on top of app.yaml and also in WSGIApplication before wildcard, but doesn't seem different (still takes 20+ ms per request). — Bemmu
I wonder if the "frontend instance" time includes TCP socket establishment cost (including a roundtrip)? That might explain this — Bemmu
Potentially of interest: stackoverflow.com/a/40912401/4495081 — Dan Cornilescu

Dan Cornilescu Dan Cornilescu · Accepted Answer · 2016-12-06T14:13:44

If keeping the instance hours tab low is of higher interest that keeping the request latency low then maybe you can drop the automatic scaling in favour of basic scaling. From Scaling types and instance classes:

Basic Scaling

A service with basic scaling will create an instance when the application receives a request. The instance will be turned down when the app becomes idle. Basic scaling is ideal for work that is intermittent or driven by user activity.

Automatic Scaling

Automatic scaling is based on request rate, response latencies, and other application metrics.

Automatic Scaling targets a better user experience and can launch a large number of instances based on the incoming traffic patterns.

There are config parameters that you can use to tune the scaling behaviour, but for automatic scaling fundamentally there isn't one limiting the number of instances running in parallel, which can lead to balooning instance hours. BTW, your min_idle_instances: 1 will pretty much keep alive an instance at all times, almost always idle (other instances will actually handle the bulk of the requests).

Basic scaling on the other hand has a max_instances config which can be used to effectively cap the bill's instance hours:

max_instances

Required. The maximum number of instances for App Engine to create for this service version. This is useful to limit the costs of a service.

Python App Engine webapp2 slow to route

1 Answers