Apache Flink, more threads than Kafka partitions

Question

The data flow is simple like

kafka -> some logic -> kafka

and 'some logic' is a bottleneck here so I want to use more threads/tasks to increase throughput instead of increasing kafka partitions (currently 3). Order between input and output topics doesn't matter here.

It can be easily done with Apache Storm. I can just increase parallelism of a bolt for the some logic. How can I do it with Flink? More general question is if there is any simple way to use different parallelism for different stages with Flink?

Fabian Hueske Fabian Hueske · Accepted Answer · 2017-07-20T08:01:50

This is quite simple in Flink. You can specify the parallelism of each operator using the setParallelism() method:

DataStream<String> rawEvents = env
  .addSource(new FlinkKafkaConsumer010("topic", new SimpleStringSchema(), props));

DataSteam<String> mappedEvents = rawEvents
  .flatMap(new Tokenizer())
  .setParallelism(64); // set parallelism to 64

Apache Flink, more threads than Kafka partitions

1 Answers