0
votes

I have one DStream Streaming Application with Spark 2.3.1.

In which i am reading data from Kafka and writing into Kerberized HDFS, but randomly my batches started failing while writing into HDFS and exception shows kerberos related error but my spark application keeps on running so I did not get to know that my batches are failing unless and until i check the logs.

My Question is that, is there any way so i can limit number of continuous batch failover? Suggest if some property exists where we can set number of batch failures after which the application should result in failure. eg. spark.streaming.xyz = 3, then application should stop after 3 continuous micro batch failure.

1

1 Answers

0
votes

You can maintain some sort of variable(static variable on driver level) and keep incrementing it as you get exceptions,once you reach a particular threshold(let's say 3) you can close the spark streaming context to kill the job

try{
.
.
}catch(Exception e){
   count++;
   if(count>threshold)
    streamingContext.close();
}