We are having an intermittent issue with connections to our mysql server timing out. The error we are receiving is as following.
(2003, 'Can\'t connect to MySQL server on \'<connection>\' ((2013, "Lost connection to MySQL server during query (error(104, \'Connection reset by peer\'))"))')
Callstack:
File "/usr/lib64/python2.7/site-packages/pymysql/connections.py", line 818, in _connect
2003, "Can't connect to MySQL server on %r (%s)" % (self.host, e))
File "/usr/lib64/python2.7/site-packages/pymysql/connections.py", line 626, in __init__
self._connect()
Some more info:
- We have a flight of EC2 servers that are constantly running queries to a backend RDS.
- We average about 500 connections per second to the RDS
- We have around 0 - 4 hiccups per RDS per day
- The hiccups don't correspond with our maintenance window
- When we hit a hiccup it can affect quite a few connections ~50
- When a hiccup happens it will disrupt connections across all servers and ports
The error itself looks to be generated from the tcp connection being closed on the ec2. Our TCP keep alive time is set to 7200 seconds and that's when the error is fired off.
My question is what can be done to track down why these hiccups happen? It's great that they don't happen often, but it's not ideal that they happen at all.
Any advice would be appreciated thanks!
Update 10/29:
I've been running a service checking to see if I have any long processes running on the sql server and it looks like these errors aren't getting that far. A new process is never created for this connection! I have still been receiving the hiccups, just no signs of connections.