5
votes

Recently upgraded to MySQL 5.7.12 on a Debian (Debian 3.2.78-1 x86_64 GNU/Linux) and have been running into the server hanging after every few hours. This is getting flooded in the syslog and mysql.log:

2016-06-13T18:05:20.261209Z 0 [ERROR] Error in accept: Bad file descriptor

MySQL info: mysql Ver 14.14 Distrib 5.7.12-5, for debian-linux-gnu (x86_64) using 6.2

Pieces of my.cnf mysqld section that can guide some assistance on tweaking values:

[mysqld]
max_allowed_packet      = 64M
thread_stack            = 256K
thread_cache_size       = 8

max_connections         = 150
max_connect_errors      = 10000
connect_timeout         = 30
wait_timeout            = 86400
table_open_cache        = 2048
open_files_limit        = 65535

query_cache_limit       = 4M
query_cache_size        = 128M
query_cache_type    = 1

server-id               = 1
log_bin                 = /var/log/mysql/mysql-bin.log
expire_logs_days        = 10
max_binlog_size         = 100M

# * InnoDB
innodb_file_per_table
innodb_buffer_pool_instances=2
innodb_buffer_pool_size=2G
thread_pool_size = 24
5

5 Answers

2
votes

We had the same issue on a Ubuntu 16.04 system with mysql 5.7.13 . We increased our max open files parameter in systemd like this:

/etc/systemd/system/mysql.service.d/10-ulimit.conf

[Service]
LimitNOFILE=1000000

So far the issue did not happen again. Maybe mysql needs somehow more file descriptors now.

2
votes

I found the problem (or possibly one of the problems). Here is an extract from strace on mysqld:

...
socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 20
write(2, "2017-01-29T22:22:45.433033Z 0 [N"..., 72) = 72
setsockopt(20, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(20, SOL_IPV6, IPV6_V6ONLY, [0], 4) = 0
bind(20, {sa_family=AF_INET6, sin6_port=htons(3306), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
listen(20, 70)                          = 0
fcntl(20, F_GETFL)                      = 0x2 (flags O_RDWR)
fcntl(20, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
 ...
accept(20, {sa_family=AF_INET6, sin6_port=htons(58332), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 37
rt_sigaction(SIGCHLD, {SIG_DFL, [CHLD], SA_RESTORER|SA_RESTART, 0x7f3ddeac84b0}, {SIG_DFL, [], 0}, 8) = 0
getpeername(37, {sa_family=AF_INET6, sin6_port=htons(58332), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
getsockname(37, {sa_family=AF_INET6, sin6_port=htons(3306), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
open("/etc/hosts.allow", O_RDONLY)      = 38
fstat(38, {st_mode=S_IFREG|0644, st_size=589, ...}) = 0
read(38, "# /etc/hosts.allow: list of host"..., 4096) = 589
read(38, "", 4096)                      = 0
close(38)                               = 0
open("/etc/hosts.deny", O_RDONLY)       = 38
fstat(38, {st_mode=S_IFREG|0644, st_size=704, ...}) = 0
read(38, "# /etc/hosts.deny: list of hosts"..., 4096) = 704
close(38)                               = 0
socket(PF_LOCAL, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 38
connect(38, {sa_family=AF_LOCAL, sun_path="/dev/log"}, 110) = 0
sendto(38, "<36>Jan 29 14:23:08 mysqld[13052"..., 72, MSG_NOSIGNAL, NULL, 0) = 72
shutdown(20, SHUT_RDWR)                 = 0
close(20)                               = 0

poll([{fd=20, events=POLLIN}, {fd=22, events=POLLIN}], 2, -1) = 1 ([{fd=20, revents=POLLNVAL}])
accept(-1, 0x7ffe6ebd7160, 0x7ffe6ebd70fc) = -1 EBADF (Bad file descriptor)
write(2, "2017-01-29T22:23:08.109451Z 0 [E"..., 75) = 75
 ... rinse and repeat *REALLY* fast!

In locking down my system with tcp_wrappers I had inadvertently taken mysqld out of both hosts.allow and hosts.deny. It seems that after checking both hosts.allow and hosts.deny mysqld shuts down and closes the socket as you might expect. However it them immediately starts to poll the (now non-existent) socket for activity.

I just did another test where my tcp_wrappers was correctly configured. When I connect from an authorized host all is fine; however when I connected from a blocked address the same issue occurs. Based on this I recommend using other tools to secure mysqld and making your tcp_wrappers config more open than your firewall. That being said the bug should still be fixed!

This fix has yet to stand the test of time so, as usual, YMMV. Hope it helps anyway

Nick

0
votes

I have the same issue after upgrading to Percona Cluster 5.7.14-26.17-1.trusty.

The ulimit.conf suggestion doesn't help, and I've made sure that there are sufficient file handles, so far as I can tell, by editing /etc/security/limits.conf and /etc/sysctl.conf.

I can reproduce this easily by telnetting to post 3306 and then disconnecting; the server then goes into a spin logging this error.

A horrible workaround for this, which looks promising in my environment, is to avoid using TCP connections on port 3306, and use unix sockets instead.

You can proxy from port 3306 to the socket by changing the port number in /etc/mysql/my.cnf and then using socat

nohup socat TCP4-LISTEN:3306,fork UNIX-CONNECT:/var/run/mysqld/mysqld.sock&

If I then telnet in on port 3306 and disconnect, I can't provoke the problem. I intend to report back on how well this stands up over time.

FWIW, the code looks as though it expects this to happen sometimes:

for (uint retry= 0; retry < MAX_ACCEPT_RETRY; retry++)
{
  socket_len_t length= sizeof(struct sockaddr_storage);
  connect_sock= mysql_socket_accept(key_socket_client_connection, listen_sock,
                                    (struct sockaddr *)(&cAddr), &length);
  if (mysql_socket_getfd(connect_sock) != INVALID_SOCKET ||
      (socket_errno != SOCKET_EINTR && socket_errno != SOCKET_EAGAIN))
    break;
}
if (mysql_socket_getfd(connect_sock) == INVALID_SOCKET)
{
  /*
    accept(2) failed on the listening port, after many retries.
    There is not much details to report about the client,
    increment the server global status variable.
  */
  connection_errors_accept++;
  if ((m_error_count++ & 255) == 0) // This can happen often
    sql_print_error("Error in accept: %s", strerror(errno));
  if (socket_errno == SOCKET_ENFILE || socket_errno == SOCKET_EMFILE)
    sleep(1);             // Give other threads some time
  return NULL;
}
0
votes

I came here with the same error and none of the solutions worked BUT after some research on our end we found that it was apparmor that was denying our logging directory causing the bad file descriptors error message.