I have something confused about the spark streaming checkpoint, please help me, thanks!
There are two types of checkpointing (Metadata & Data checkpointing). And the guides said when using stateful transformations, data checkpointing is used. I'm very confused about this. If I don't use stateful transformations, does spark still write Data checkpointing content?
Can I control the checkpoint position in codes ? Can I control which rdd can be written to data checkpointing data in streaming like batch spark job ? Can I use foreachRDD
rdd => rdd.checkpoint()
in streaming?If I don't use the
rdd.checkpoint()
, what is the default behavior of Spark? Which rdd can be written to HDFS?