I am using a Flink streaming Java application with input source as Kafka. Totally 4 streams are used in my application. One is the main data stream and another 3 three are used for a broadcast stream.
Stream A is the main stream, it flows continuously from Kafka.
Stream B is a dataset of enrichment data. Stream B is a Combined stream of Stream C , Stream D, Stream E. It's a big one (All the 3 stream size is large).
Stream C, Stream D, Stream E streams Object type is different. (For example, one stream type is Employee, Another one type is AttendanceDetails, another one is SalaryDetails, etc...).
I was joined the three broadcast streams using Either type. I have broadcast as the Stream B and able to receive in Broadcast Process Function context state (i.e in processBroadcastElement() ).
My questions are,
Is it possible to store large data in Broadcast state?
Is it possible for Broadcast large data?
If possible for store large data means, how much data(i.e data size) can able to store in Broadcast state and can able to apply Fault tolerance and Flink checkpoints? My Flink system memory and storage size are:
Memory: 8 GB
Disk Size: 20-25 GB
How to configure memory size for the Broadcast state in Flink?
Note: As per my understanding, Flink Broadcast State is kept in memory at runtime (it mean broadcast state will not be stored at rocksdb) and the broadcast stream is used as a low-throughput event stream. Since currently, the RocksDB state backend is not available for the operator state.