8
votes

I face a resource problem in my netty server application.

[io.netty.channel.DefaultChannelPipeline] An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.: java.io.IOException: Too many open files
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) [rt.jar:1.7.0_60]
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:241) [rt.jar:1.7.0_60]
    at io.netty.channel.socket.nio.NioServerSocketChannel.doReadMessages(NioServerSocketChannel.java:135) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.channel.nio.AbstractNioMessageChannel$NioMessageUnsafe.read(AbstractNioMessageChannel.java:69) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) [netty-all-4.0.25.Final.jar:4.0.25.Final]
    at java.lang.Thread.run(Thread.java:745) [rt.jar:1.7.0_60]

As a workaround I increased the max open files with ulimit -n, but I can still whatch the number of files / sockets increasing:

lsof -p 5604 | grep socket | wc -l

now well over 3000...

Can't see any open or hanging connections with netstat...

I use a ReadTimeoutHandler to close unused connections with a the following exceptionHandler code:

@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) throws Exception {
  if (cause instanceof ReadTimeoutException) {
    logger.debug("Read timeout - close connection");
  } else {
    logger.info(cause.getMessage());
  }
  ctx.close();
}

Server bootstrap looks like this:

ServerBootstrap b = new ServerBootstrap();
b.group(bossGroup, workerGroup).channel(NioServerSocketChannel.class).childHandler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch) throws Exception {
    ch.pipeline().addLast(new ReadTimeoutHandler(60));
    ch.pipeline().addLast(new LoggingHandler(mySpec.getPortLookupKey().toLowerCase()));
    ch.pipeline().addLast(new RawMessageEncoder());
    ch.pipeline().addLast(new RawMessageDecoder());
    ch.pipeline().addLast(new RequestServerHandler(ctx.getWorkManager(), factory));
}
}).option(ChannelOption.SO_BACKLOG, 128).childOption(ChannelOption.SO_KEEPALIVE, true);

ChannelFuture channelFuture = b.bind(port).sync();

Did I miss something? Shouldn't the the number of open files decrease once a connection is closed (either by the remote host or by the timeout handler)?

What do I need to change to save resources here?

Update: I use netty 4.0.25

Update 2: As requested I moved the logging handler in front of the ReadTimeouthandler, here are the logs. Situation where the client normally disconnects:

09:41:39,755 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] REGISTERED
09:41:39,756 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] ACTIVE
09:41:39,810 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] RECEIVED(1024B)
09:41:39,813 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] RECEIVED(1024B)
09:41:39,814 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] RECEIVED(150B)
09:41:40,854 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] WRITE(1385B)
09:41:40,855 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 => /127.0.0.1:4300] FLUSH
09:41:40,861 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 :> /127.0.0.1:4300] INACTIVE
09:41:40,864 [3-1] [id: 0xca6601a2, /127.0.0.1:64258 :> /127.0.0.1:4300] UNREGISTERED

Situation, where the client does NOT disconnect:

10:04:24,104 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] REGISTERED
10:04:24,107 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] ACTIVE
10:04:24,594 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] RECEIVED(1024B)
10:04:24,597 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] RECEIVED(1024B)
10:04:24,598 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] RECEIVED(150B)
10:04:25,638 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] WRITE(1383B)
10:04:25,639 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] FLUSH
10:05:25,389 [3-1] [id: 0x48076684, /127.0.0.1:50525 => /127.0.0.1:4300] CLOSE()
10:05:25,390 [3-1] [id: 0x48076684, /127.0.0.1:50525 :> /127.0.0.1:4300] CLOSE()
10:05:25,390 [3-1] [id: 0x48076684, /127.0.0.1:50525 :> /127.0.0.1:4300] INACTIVE
10:05:25,394 [3-1] [id: 0x48076684, /127.0.0.1:50525 :> /127.0.0.1:4300] UNREGISTERED

So there's 60s gap before the close (as expected from the ReadTimeoutHandler)

After some more analysis, I have the impression, that even with a normal disconnect from the client, the number of open files increases! Also, there's no CLOSE() in that situation...

1
which Netty version are you using?HCarrasko
Could you insert a LoggingHandler with high enough log level before ReadTimeoutHandler in your pipeline, and update your questions with the log?trustin
Maybe, it's something more related to the SO, those connection events seem pretty normal to me, and I have never seen such a problem. There is a new netty version 4.0.28, can you try with it?crigore
So are you closing connections when the peer disconnects?user207421
@EJP: All my handlers contain this method: 'public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) throws Exception { ctx.channel().close(); }' What else would I need to do to close the connection?rik

1 Answers

1
votes

Maybe it's related with this netty issue https://github.com/netty/netty/issues/1731

This is expected behaviour, and is nothing that can be changed. The JVM is signalling that it is unable to accept the channel - therefore no connection can be initiated and no response can be sent. The client will see a connection failure. If you have a load-balancer, it should retry against an alternate host, or return the 503 on your application's behalf.