1
votes

I have Spark code that writes a batch to Kafka as specified here:

https://spark.apache.org/docs/2.4.0/structured-streaming-kafka-integration.html

The code looks like the following:

  df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)") 
   \
   .write \
   .format("kafka") \
   .option("kafka.bootstrap.servers", 
           "host1:port1,host2:port2") \
   .option("topic", "topic1") \
   .save()

However the data only gets written to Kafka partition 0. How can I get it written uniformly to all partitions in the same topic ?

1
How many partitions does the topic actually have? - OneCricketeer
How many partitions in the topic? How many distinct keys are there in df? - mrsrinivas

1 Answers

3
votes

Kafka distributes messages based on their keys. Therefore, messages with the same key will be placed into the same partition. It might be the case that all of your messages have the same key.