2
votes

While deploying Flink, I got the following OOM error messages:

org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: java.lang.OutOfMemoryError: Direct buffer memory at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.exceptionCaught(PartitionRequestClientHandler.java:153) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:246) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:224) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)

Caused by: io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:234) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) ... 9 more

Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)

I set 'taskmanager.network.numberOfBuffers: 120000' in flink-conf file, but it doesn't work.

Number of TaskManger: 50, Memory per TaskManager: 16GB, Cores per TaskManager: 16, Number of Slots per TasmNager: 8

For the job I ran, I used Parallelism as 25 and the raw data file is about 300GB and there are lots of join operations, which, I guess, requires lots of network communications.

Please let me know if you have any idea about what's going on here

1
Did you have a look at flink.apache.org/…Matthias J. Sax

1 Answers

3
votes

Which version of Flink are you using? Flink 0.10.0 and 0.10.1 have an issue with an upgraded Netty version. This issue was fixed about 3 three weeks ago and not yet available in a release.

It is fixed in the master branch (published as 1.0-SNAPSHOT) or the 0.10 branch.