I have a Spark Streaming job running on a cluster (Spark 1.6) which checkpoints to S3. When I start up the job initially, I can see "Streaming" tab. However when I restart the job from checkpoint the Streaming tab disappears. The job still works as a streaming job and I see the batches appear at the configured batch interval. See below.
If I clear out the checkpoint data, the tab comes back. I suspect that the Streaming tab is not registered correctly while restarting from a checkpoint.
I looked at the Spark Streaming code. Is it possible this flow is not invoked when the application state is deserialised from a checkpoint?
Does anyone know how to fix this?