0
votes

I am using mapwithState in my java spark streaming application, and I would like to avoid doing a checkpoint. The reason for this is that I do not want to install HDFS. I believe a checkpoint is only required for fault tolerance.

However, if I do not care about fault tolerance, is it possible to skip the checkpoint, but still use mapwithState?

2

2 Answers

0
votes

It is not possible, checkpoint directory configuration is mandatory for mapWithState operator. It is not only for fault tolerance, it is also for storing the previous state of object in HDFS. One alternative you can try by providing local file system path as checkpoint directory when running application in local mode.

0
votes

Is it possible to skip the checkpoint with mapWithState?

No. At least not till Spark-2.0.

mapWithState uses checkpoint directory to store state of the streaming data. That is the cost you have to pay for making Spark Streaming stateful. You should consider using Kryo Serialization in case building stateful applications with Spark streaming.

Snapshot from databricks.

enter image description here