0
votes

enter image description here

I am running Flink version 1.8.

Main Configuration is as followed:

env.java.opts: -Djavax.net.ssl.keyStoreType=JKS -Djavax.net.ssl.trustStoreType=JKS
taskmanager.heap.size: 12288m
taskmanager.numberOfTaskSlots: 7

The declared heap size is 12GB, why does it show 7.33GB in the overview section.

As per docs, Heap Size = declared heap size - network buffer memory(Default: 0.1 times declared heap but up to max 1gb). So correct value is whats shown in JVM(Heap/Non-Heap) section ie. 11GB

Network Memory Segments: I assume, since 1GB is now used as network buffer memory, so 32768 segments basically refer to the count of 32KiB size memory segments. These are used for TCP channels for transfer of data between tasks. My understanding is that it's still on heap(and hence subtracted from declared heap) but allocated more like ByteBuffer.allocate(). Is that correct?

Following this blog Juggling with bits and bytes, specifically, By default 70% of the JVM heap that is available after service initialisation is allocated by the MemoryManager.. So this is the memory used by tasks in the form of memory segments to buffer data at them for checkpoint alignments, broadcasted data, window data etc. Since taskmanager.memory.off-heap = false in this case , this memory will be allocated on heap. Hence I assume, 4.95GB used memory shown here is basically the memory used by tasks to buffer data for various purposes out of the managed memory which should be 11GB * 0.7 = 7.7GB.

How can I get access to this managed memory metric. Is there a metric exposed for this.

Also,

What does the Direct Memory and Mapped memory metric refer to. I am using RocksDB as my state backend. So is it the size of state, which is managed off heap? How is the capacity and usage of it determined by Flink. What kind of issues could be surfaced because of misconfiguration, if possible, of this value.

Also, this is a streaming job, if somehow that matters.

1

1 Answers

0
votes

The network buffers are off-heap. That's one point I can confidently respond to.

Otherwise, see the section on Memory towards the end of this blog post: https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html. Hopefully that will help.