12
votes

I have a Django app on Google App Engine app which is connected to a Google Cloud SQL, using the App Engine authentication.

Most of the time everything works fine, but from time to time the following exception is raised:

OperationalError: (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 38")

According to the docs, this error is returned when:

If Google Cloud SQL rejects the connection, for example, because the IP address your client is connecting from is not authorized.

This doesn't make much sense in my case, because the authentication is done by the App Engine server.

What might cause these sporadic errors?

4
Just to make sure, your application is deployed to the cloud right? You're not running localhost?Gwell
@Gwell yes, it's on the GAE cloud.Tzach
I couldn't exactly find much info about error 38. But most errors regarding Losing connections to MySQL server at 'reading initial command.. etc' had to do with the SQL settings, particularly timeouts and authorization, but these were all localhost issues. Take a look at this doc: developers.google.com/cloud-sql/docs/admin-api/v1beta1/… and see if any setting you can modify on your Cloud SQL instance that could resolve this issue.Gwell
Did you set your app to run only on EU servers?Gwell
I have the same issue from time to time. I run Django 1.5 on AppEngine using CloudSQL and get the same exact error occasionally.Aaron

4 Answers

15
votes

I had a similar issue and ended up contacting Google for help. They explained it happens when they need to restart or move an instance. If the client instance restarted or was moved to another host server (for various versions) the IP’s won’t match and throw that error. They mentioned that the servers may restart for patches, errors and slow downs causing a similar behavior (be it the same error or similar). The server also moves to try and be closer to the instances to increase response times. If you send a request during the move it will throw errors.

They told me I need to code in retry catches incase that happens, similar to how you handle datastore timeouts. Keeping in mind to build in back off mechanics, sending too many request too quickly after a restart could cause a crash.

How often does this happen?

3
votes

In our case we had renamed the instances incorrectly inside the code. When we changed back to the correct names everything worked fine. Make sure your Cloud SQL instance is named correctly both inside the Google Cloud Console and within the code you use to access it, and make sure that your Cloud SQL instance allows your Google App Engine instance to connect to it it's Access control.

1
votes

In my case the issue was caused my expired server SSL certificate on the CloudSQL instance. Strangely it was not shown in the Google Cloud Console and figured it out after downloading the certificate and decoding it with openssl (openssl x509 -in server-ca.pem -text -noout).

I was able to figure out cause of the problem after trying to connect with cloud_sql_proxy; luckily it gave more meaningful error message couldn't connect to "...": x509: certificate has expired or is not yet valid.

Connection from AppEngine Standard application started to work immediately after reseting SSL configuration from Google Cloud Console. I noticed that after reset validity date appeared on the console.

-1
votes

I had this problem too using Django 1.10 and GAE. The application worked fine locally (connecting the cloud sql via cloud_sql_proxy), but I'd get the 38 error when using the GAE instance of the application.

My problem turned out to be my database user. The user had a hyphen in it. Once I created a new user without a hyphen and changed my application to use the new user, the GAE instance of the application worked if