I have 2 liner Spark Structured Streaming job that copies data from one kafka topic to another.
Is it possible to publish/view the number of events consumed/produced in the Spark UI ?
The "Streaming Tab" in the Spark Web UI is not available for Structured Streaming, only for the Direct API. Starting with version 3.x it is available.
However, there is another easy way of displaying the number of events processed by a Spark Structured Streaming job.
You could use a StreamingQueryListener
import org.apache.spark.sql.streaming.StreamingQueryListener
import org.apache.spark.sql.streaming.StreamingQueryListener.QueryProgressEvent
class CountNumRecordsListener extends StreamingQueryListener {
override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = { }
override def onQueryProgress(event: QueryProgressEvent): Unit = {
println(s"""numInputRows: ${event.progress.numInputRows}""")
}
override def onQueryTerminated(event: StreamingQueryListener.QueryTerminatedEvent): Unit = { }
}
With that class you can then add a listener to your stream application (where spark is your SparkSession).
val countNumRecordsListener = new CountNumRecordsListener
spark.streams.addListener(countNumRecordsListener)
The StreamingQueryProgress class has even further information to help you understand the data processing of your streaming job.