We are putting data file in HDFS path which is monitored by spark streaming application. And spark streaming application sending data to kafka topic. We are stopping streaming application ?in between and again starting so that it should start from where it stopped. But it is processing whole input data file again. So i guess checkpointing is not properly being used. We are using spark 1.4.1 version How we can make the streaming application to start from the point where it failed/stopped? Thanks in advance.
1 Answers
0
votes
While creating the context use getOfCreate(checkpoint,..) to load previous checkpointed data if any.
eg: JavaStreamingContext ssc = JavaStreamingContext.getOrCreate(checkpointDir,..)
Check a working sample program https://github.com/atulsm/Test_Projects/blob/master/src/spark/StreamingKafkaRecoverableDirectEvent.java