Our project is running on the Google App Engine standard environment with auto-scaling configured as mentioned below. Warm up requests are enabled in the app and we are using Google Endpoints service. However, I am facing a latency issue in the different scenarios. Environment: Java 8, Instance type: F4_1G Configuration for autoscaling: min-instances: 2 max-concurrent-requests: 80 min-pending-latency: 6s max-pending-latency: 10s
I tested with JMeter with configuration of sending 85 asynchronous requests with a ramp up period of 10 seconds. From the application logs I can notice that appengine takes a long time to serve the request.Below are the questions I have
1.Most of the requests are failing because of time exceed. In image 1, we can spot that the request takes 88.2 seconds. I know that AppEngine auto scaling has a 60 seconds request timeout limit. But we have configured autoscaling with a minimum 2 instances and there is no restriction for max-instance. The AppEngine Instance should handle the request otherwise AppEngine should scale up to handle the request. Why is it not happening? Image_1
- While scaling up, the request takes 43.6 seconds. In image 2, we are able to see that the request came at 20:27:01:663 IST and the first line of API execution starts at 20:27:40:407 IST. What is happening in between time? Can I get a log for this period? Image_2
- After the scaleup, subsequent requests also take a very long time to serve. For instance an API request usually gets completed within 2 seconds. In image 3, we can note that API takes 42.4s without loading-request process and then request comes at 20:27:01:728 IST. The first line of API execution starts at 20:27:40:708 IST. What is happening in between time? Image_3