When I run my spark application. Several jobs are spawned. Each Job has several stages.
I'm experimenting with persisting RDDs. I'm persisting an RDD to disk. But there is no way I can tell if it is being reused across the job.
When I look at the DAG, I do see a green dot
signifying that an rdd is persisted. But I also see the previous map/filter etc in the dag.
For example in Job-0 Dag I see:
RandomRDD [0] -> MapParitionRDD [1] -> MapParitionRDD [2] (green) -> Filter [3]...
And then for Job-1 Dag I also see:
RandomRDD [0] -> MapParitionRDD [1] -> MapParitionRDD [2] (green) -> Filter [3]...
How can I tell if rdd[0], rdd[1] & rdd[2] were recalculated or simply dehydrate?
In general by looking at the job-history how can you tell if an rdd was recalculated or simply dehydrated?