Spark Structured Streaming app has no jobs and no stages

Question

I have a simple Spark Structured Streaming app that reads from Kafka and writes to HDFS. Today the app has mysteriously stopped working, with no changes or modifications whatsoever (it had been working flawlessly for weeks).

So far, I have observed the following:

App has no active, failed or completed tasks
App UI shows no jobs and no stages
QueryProgress indicates 0 input rows every trigger
QueryProgress indicates offsets from Kafka were read and committed correctly (which means data is actually there)
Data is indeed available in the topic (writing to console shows the data)

Despite all of that, nothing is being written to HDFS anymore. Code snippet:

val inputData = spark
.readStream.format("kafka")
.option("kafka.bootstrap.servers", bootstrap_servers)
.option("subscribe", topic-name-here")
.option("startingOffsets", "latest")
.option("failOnDataLoss", "false").load()

inputData.toDF()
.repartition(10)
.writeStream.format("parquet")
.option("checkpointLocation", "hdfs://...")
.option("path", "hdfs://...")
.outputMode(OutputMode.Append())
.trigger(Trigger.ProcessingTime("60 seconds"))
.start()

Any ideas why the UI shows no jobs/tasks?

Ander Murillo Zohn Ander Murillo Zohn · Accepted Answer · 2018-04-04T23:28:19

For anyone facing the same issue: I found the culprit:

Somehow the data within _spark_metadata in the HDFS directory where I was saving the data got corrupted.

The solution was to erase that directory and restart the application, which re-created the directory. After data, data started flowing.

Spark Structured Streaming app has no jobs and no stages

1 Answers