I have a simple Spark Structured Streaming app that reads from Kafka and writes to HDFS. Today the app has mysteriously stopped working, with no changes or modifications whatsoever (it had been working flawlessly for weeks).
So far, I have observed the following:
- App has no active, failed or completed tasks
- App UI shows no jobs and no stages
- QueryProgress indicates 0 input rows every trigger
- QueryProgress indicates offsets from Kafka were read and committed correctly (which means data is actually there)
- Data is indeed available in the topic (writing to console shows the data)
Despite all of that, nothing is being written to HDFS anymore. Code snippet:
val inputData = spark
.readStream.format("kafka")
.option("kafka.bootstrap.servers", bootstrap_servers)
.option("subscribe", topic-name-here")
.option("startingOffsets", "latest")
.option("failOnDataLoss", "false").load()
inputData.toDF()
.repartition(10)
.writeStream.format("parquet")
.option("checkpointLocation", "hdfs://...")
.option("path", "hdfs://...")
.outputMode(OutputMode.Append())
.trigger(Trigger.ProcessingTime("60 seconds"))
.start()
Any ideas why the UI shows no jobs/tasks?


