I have implemented Spark Streaming using createDirectStream. My Kafka producer is sending several messages every second to a topic with two partitions.
On Spark streaming side, i read kafka messages every second and them I'm windowing them on 5 second window size and frequency.
Kafka message are properly processed, i'm seeing the right computations and prints.
But in Spark Web UI, under Streaming section, it is showing number of events per window as Zero. Please see this image:
I'm puzzled why is it showing Zero, shouldn't it show number of Kafka messages being feed into Spark Stream?
Updated:
This issue seems to be happening when i use groupByKeyAndWindow() api. When i commented out this api usage from my code, Spark Streaming UI started reporting Kafka event input size correctly.
Any idea why is this so? Could this a defect in Spark Streaming?
I'm using Cloudera CDH: 5.5.1, Spark: 1.5.0, Kafka: KAFKA-0.8.2.0-1.kafka1.4.0.p0.56