Netty server does not close/release socket

Question

I face a resource problem in my netty server application.

[io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.: java.io.IOException: Too many open files
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) [rt.jar:1.7.0_60]
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241) [rt.jar:1.7.0_60]
    at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:69) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_60]

As a workaround I increased the max open files with ulimit -n, but I can still whatch the number of files / sockets increasing:

lsof -p 5604 | grep socket | wc -l

now well over 3000...

Can't see any open or hanging connections with netstat...

I use a ReadTimeoutHandler to close unused connections with a the following exceptionHandler code:

@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) throws Exception {
  if (cause instanceof ReadTimeoutException) {
    logger.debug("Read timeout - close connection");
  } else {
    logger.info(cause.getMessage());
  }
  ctx.close();
}

Server bootstrap looks like this:

ServerBootstrap b = new ServerBootstrap();
b.group(bossGroup, workerGroup).channel(NioServerSocketChannel.class).childHandler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch) throws Exception {
    ch.pipeline().addLast(new ReadTimeoutHandler(60));
    ch.pipeline().addLast(new LoggingHandler(mySpec.getPortLookupKey().toLowerCase()));
    ch.pipeline().addLast(new RawMessageEncoder());
    ch.pipeline().addLast(new RawMessageDecoder());
    ch.pipeline().addLast(new RequestServerHandler(ctx.getWorkManager(), factory));
}
}).option(ChannelOption.SO_BACKLOG, 128).childOption(ChannelOption.SO_KEEPALIVE, true);

ChannelFuture channelFuture = b.bind(port).sync();

Did I miss something? Shouldn't the the number of open files decrease once a connection is closed (either by the remote host or by the timeout handler)?

What do I need to change to save resources here?

Update: I use netty 4.0.25

Update 2: As requested I moved the logging handler in front of the ReadTimeouthandler, here are the logs. Situation where the client normally disconnects:

09:41:39,755 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] REGISTERED
09:41:39,756 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] ACTIVE
09:41:39,810 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] RECEIVED(1024B)
09:41:39,813 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] RECEIVED(1024B)
09:41:39,814 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] RECEIVED(150B)
09:41:40,854 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] WRITE(1385B)
09:41:40,855 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] FLUSH
09:41:40,861 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 :> /127.0.0.1:4300] INACTIVE
09:41:40,864 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 :> /127.0.0.1:4300] UNREGISTERED

Situation, where the client does NOT disconnect:

10:04:24,104 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] REGISTERED
10:04:24,107 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] ACTIVE
10:04:24,594 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] RECEIVED(1024B)
10:04:24,597 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] RECEIVED(1024B)
10:04:24,598 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] RECEIVED(150B)
10:04:25,638 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] WRITE(1383B)
10:04:25,639 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] FLUSH
10:05:25,389 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] CLOSE()
10:05:25,390 [3-1] [id: 0x48076684, /127.0.0.1:50525 :> /127.0.0.1:4300] CLOSE()
10:05:25,390 [3-1] [id: 0x48076684, /127.0.0.1:50525 :> /127.0.0.1:4300] INACTIVE
10:05:25,394 [3-1] [id: 0x48076684, /127.0.0.1:50525 :> /127.0.0.1:4300] UNREGISTERED

So there's 60s gap before the close (as expected from the ReadTimeoutHandler)

After some more analysis, I have the impression, that even with a normal disconnect from the client, the number of open files increases! Also, there's no CLOSE() in that situation...

Could you insert a LoggingHandler with high enough log level before ReadTimeoutHandler in your pipeline, and update your questions with the log? — trustin
Maybe, it's something more related to the SO, those connection events seem pretty normal to me, and I have never seen such a problem. There is a new netty version 4.0.28, can you try with it? — crigore
@EJP: All my handlers contain this method: 'public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) throws Exception { ctx.channel().close(); }' What else would I need to do to close the connection? — rik

alexey alexey · Accepted Answer · 2015-06-29T12:59:28

Maybe it's related with this netty issue https://github.com/netty/netty/issues/1731

This is expected behaviour, and is nothing that can be changed. The JVM is signalling that it is unable to accept the channel - therefore no connection can be initiated and no response can be sent. The client will see a connection failure. If you have a load-balancer, it should retry against an alternate host, or return the 503 on your application's behalf.

Netty server does not close/release socket

1 Answers