3
votes

Problem

I have an application running on a Cloud Run instance for a 5 months now. The application has a startup time of about 3 minutes and when the startup is over it does not need much RAM. Here are two snapshots of docker stats when I run the app locally :

When the app isn't excited

enter image description here

When the app is receiving 10 requests per seconds (Which is way over our use case for now) :

RAM Usage when the application is excited

There aren't any problems when I run the app locally however problems arise when I deploy it on Cloud Run. I keep receiving : "OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k" messages followed by the restart of the app. This is a problem because as I said the app takes up to 3 minutes to restart, during which the requests take a lot of time to get treated.

I already fixed the cold start issue by using a minimum instance of 1 AND using a google cloud scheduler to query the service every minutes.

Examples

Here are examples of what I see in the logs. First example

Second example

In the second example the warnings came once again just after the application restart which caused a second restart in a row, this happens quite often. Also note that those warnings/restarts are not necessarily happening when users are connected to the app but can happen when the only activity is due to the Google Cloud Scheduler

I tried increasing the allocated RAM and CPU to 4 CPUs and 4 Go of RAM (which is a huge over kill) and yet the problem remains.

Update 02/21 As of 01/01/21 we stopped witnessing such behavior from our cloud run service (maybe due an update, I don't know). I did contact the GCP support but they just told me to raise an issue on the OpenBLAS github repo but since I can't reproduce the behavior I did not do so. I'll leave the question open as nothing I did really worked.

1

1 Answers

3
votes

OpenBLAS performs high performance compute optimizations and need to know what are the CPU capacity to tune itself the best.

However, when you run a container on Cloud Run, you run it in a sandbox GVisor, to increase the security and the isolation of all the container running on the same serverless platform.

This sandbox intercepts low level kernel calls and discard the abnormal/dangerous ones. I guess that for this reason that OpenBLAS can't determine the L2 cache size. On your environment, you haven't this sandbox, and you can access directly to the CPU info.

Why it's restart?? It could be a problem with OpenBLAS or a problem with Cloud Run (suspicious kernel call, kill the instance and restart it).

I haven't immediate solution because I don't know OpenBLAS. I had similar behavior with Tensorflow Serving, and tensorflow proposes a compiled version without any CPU optimization: less efficient but more portable and resilient to different environment constraint. If a similar compilation exists for OpenBLAS, it could be great to test it.