Writing to multiple Kafka partitions from Spark

Question

I have Spark code that writes a batch to Kafka as specified here:

https://spark.apache.org/docs/2.4.0/structured-streaming-kafka-integration.html

The code looks like the following:

  df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)") 
   \
   .write \
   .format("kafka") \
   .option("kafka.bootstrap.servers", 
           "host1:port1,host2:port2") \
   .option("topic", "topic1") \
   .save()

However the data only gets written to Kafka partition 0. How can I get it written uniformly to all partitions in the same topic ?

How many partitions in the topic? How many distinct keys are there in df? — mrsrinivas

Giorgos Myrianthous Giorgos Myrianthous · Accepted Answer · 2019-05-24T16:14:18

Kafka distributes messages based on their keys. Therefore, messages with the same key will be placed into the same partition. It might be the case that all of your messages have the same key.

Writing to multiple Kafka partitions from Spark

1 Answers