1
votes

I have created a streaming program for streaming mongodb oplog using flink and kafka. As per discussion with Flink support team ordering of streaming can not be guaranteed through out kafka partitions. I have created N kafka partitions and want to create N number of flink kafka consumer per partition so the order of streaming should be maintained at least in specific partition. Please suggest me is it possible to create partition specific flink kafka consumer?

I am using env.setParallelism(N) for parallel processing.

Attached image shows high level architecture of the program enter image description here

1

1 Answers

4
votes

After doing lot of research I found out the solution on my own question. As global ordering through out kafka partition is not practical I have created N number of kafka partition with N flink parallelism and wrote an custom kafka partitioner which will override default kafka partitioning strategy and send records to specific partition according to the logic specified in custom partitioner. This ensures specific messages always goes to same partition. while setting flink parallelism keep following points in your mind.

1) kafka partitions == flink parallelism: this case is ideal, since each consumer takes care of one partition. If your messages are balanced between partitions, the work will be evenly spread across flink operators;

2) kafka partitions < flink parallelism: some flink instances won't receive any messages. To avoid that, you need to call rebalance on your input stream before any operation, which causes data to be re-partitioned:

3) kafka partitions > flink parallelism: in this case, some instances will handle multiple partitions. Once again, you can use rebalance to spread messages evenly accross workers.