The situation: I have a proxy app that uses Netty 4.0.17.Final (FYI: I already had the issue with versions 4.0.13.Final and 4.0.9.Final) and that's based on the proxy from the Netty examples.
The main difference between my code and the example is that my code doesn't connect to the backend server when the channel goes active, but only on the first read as this read must first do some checks on the input before connecting and forwarding that message to the backend server.
I have unit tested and load tested my app for hours and it works fine.
The problem:
As the first message received needs to perform some blocking operation I tried to use a separate EventExecutorGroup
for that one handler that does that (so that the IO threads don't get blocked):
private static final EventExecutorGroup handlersExecutor = new DefaultEventExecutorGroup(10);
...
pipeline.addLast(handlersExecutor, "authenticationHandler", new FrontendHandler(outboundAddress));
This (= the only change that I made!) breaks the application during load tests. What breaks? XXX out of 3500 client connections report me that YY out of 500 messages for those clients didn't get a reply from the proxy (each request should get one response). An excerpt from the client logs:
2014-02-14 00:39:56.146 [id: 0x34cb2c60] ERROR (com.nsn.ucpsimulator.common.UcpDecoder) - Idle connection (/127.0.0.1:7201). PDUs received: 13
2014-02-14 00:39:56.146 [id: 0xf0955993] ERROR (com.nsn.ucpsimulator.common.UcpDecoder) - Idle connection (/127.0.0.1:7201). PDUs received: 13
2014-02-14 00:39:56.147 [id: 0x9a911fa3] ERROR (com.nsn.ucpsimulator.common.UcpDecoder) - Idle connection (/127.0.0.1:7201). PDUs received: 13
2014-02-14 00:39:56.149 [id: 0x811bbadf] ERROR (com.nsn.ucpsimulator.common.UcpDecoder) - Idle connection (/127.0.0.1:7201). PDUs received: 13
2014-02-14 00:39:56.150 [id: 0x0c4d4c5a] ERROR (com.nsn.ucpsimulator.common.UcpDecoder) - Idle connection (/127.0.0.1:7201). PDUs received: 13
The proxy app tells me that 500 messages were received and were forwarded, but that only 13 replies were received and forwarded back to the client:
2014-02-14 00:39:57.683 [id: 0x39af563b] ERROR (be.demmel.fun.UcpDecoder) - Idle connection (/127.0.0.1:49359). PDUs received: 500
2014-02-14 00:39:57.683 [id: 0x82056d39] ERROR (be.demmel.fun.FrontendHandler) - Idle connection (/127.0.0.1:52004), closing it. PDUs forwarded: 500. Success: 500
2014-02-14 00:40:00.717 [id: 0xcdca8f66] ERROR (be.demmel.fun.UcpDecoder) - Idle connection (/127.0.0.1:7900). PDUs received: 13
2014-02-14 00:40:00.718 [id: 0xcdca8f66] ERROR (be.demmel.fun.BackendHandler) - Idle connection (/127.0.0.1:7900). PDUs forwarded: 13. Success: 13
The server tells me that all is well:
2014-02-14 00:40:02.855 [id: 0x4980be2c] ERROR (com.nsn.ucpsimulator.common.UcpDecoder) - Idle connection (/127.0.0.1:37944). PDUs received: 500
2014-02-14 00:40:02.856 [id: 0x4980be2c] ERROR (com.nsn.ucpsimulator.server.TestUcpHandler) - Idle connection (/127.0.0.1:37944). PDUs sent back: 500
Does somebody know what could cause this?
Additional info:
note that everything works fine until I start using a separate
EventExecutorGroup
for the blocking handler.every time XX clients block, they all block at the same amount of replies forwarded to the client.
I have uploaded the netty code here (it's runnable, contains the proxy, server and client apps with a README): https://github.com/AndrewBourgeois/ucp-proxy/tree/master/src/main/java/be/demmel/fun
When the proxy app gets killed, this error pops up at the server side:
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_45]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.7.0_45]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.7.0_45]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_45]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) ~[na:1.7.0_45]
at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:401) ~[netty-all-4.0.9.Final.jar:na]
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:869) ~[netty-all-4.0.9.Final.jar:na]
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:208) ~[netty-all-4.0.9.Final.jar:na]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:87) ~[netty-all-4.0.9.Final.jar:na]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:478) ~[netty-all-4.0.9.Final.jar:na]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:447) ~[netty-all-4.0.9.Final.jar:na]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:341) ~[netty-all-4.0.9.Final.jar:na]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) [netty-all-4.0.9.Final.jar:na]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
I believe that this error indicates that my Netty handlers aren't processing the server replies.