I am using Spark Structured Streaming to read from Kafka topic.
Without any partition, Spark Structired Streaming consumer can read data.
But when I added partitions to topic, the client is showing messages from last partition only. I.e. if there are 4 partitions in topic and I.am pushing numbers like 1,2,3,4 in topic,then client printing only 4 not other values.
I am using latest samples and binaries from Spark Structured Streaming website.
DataFrame<Row> df = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
Am I missing anything?
--parse-keys=true
? If not, how are you checking which partitions that your messages are going into? - OneCricketeerGetOffsetShell
of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more - OneCricketeer