Here is my use case.
- have multiple sources df1 to df4, df3 represents an existing hive table
- build a df5 from df1 to df4
- insert/append df5 to that existing hive table.
- save df5 to other spot.
The problem is step 4 saves nothing to the spot. Does that mean after step 3, df3 would change? I already use cache() for df1 to df5. But It looks like the df5 would recompute if the source has been changed I checked Spark Web UI storage. all the dataframe are 100% cached.