I am running my spark SQL application and I see that the stages that are created have an execution steps in DAG where in each and every RDD that is created internally is present with cache operation. In my application I have a series of statements (eg val df1 = .....) and after doing all the transformations i do cache followed by count on the last dataframe. I am trying to understand why DAG is showing Cache for everything.DAG of a stage