0
votes

I am running my spark SQL application and I see that the stages that are created have an execution steps in DAG where in each and every RDD that is created internally is present with cache operation. In my application I have a series of statements (eg val df1 = .....) and after doing all the transformations i do cache followed by count on the last dataframe. I am trying to understand why DAG is showing Cache for everything.DAG of a stage

1

1 Answers

1
votes

It doesn't cache at every step. Persistence in the DAG visualization is denoted by a green circle.

"Cache" you see refers to the call point, which caused the job execution.