How many executors are assigned to listen to a kafka topic in Spark-kafka Integration in Spark 2.1?

Question

I have a Spark cluster with 17 executors in total. I have integrated Spark 2.1 with Kafka and reading the data from topic like:

val df = spark
  .readStream
  .format("kafka")
  .options("kafka.bootstrap.servers","localhost:9092")
  .options("subscribe","test")
  .load

Now I want to know that when I'll Submit my spark application in cluster mode how many executors (out of the total 17) will be assigned to listen to a Kafka topic and creating micro-batches in Structured Streaming.

Also, how can I limit the size of a micro-batch in Structured Streaming when reading from Kafka?

user10318917 user10318917 · Accepted Answer · 2018-09-05T07:58:19

Structured Steaming uses a single partition per Kafka topic partition. Since a single partition is processed by a single core, it will use at most this number of executors, from the ones assigned to the application.

Number of messages processed in a batch depends primarily on the trigger used (and as a consequence batch interval, if batching is used at all) however take a look at maxOffsetsPerTrigger:

Rate limit on maximum number of offsets processed per trigger interval. The specified total number of offsets will be proportionally split across topicPartitions of different volume.

How many executors are assigned to listen to a kafka topic in Spark-kafka Integration in Spark 2.1?

1 Answers