This answer clearly explains RDD persist() and cache() and the need for it - (Why) do we need to call cache or persist on a RDD
So, I understand that calling someRdd.persist(DISK_ONLY) is lazy, but someRdd.saveAsTextFile("path") is eager.
But other than this (also disregarding the cleanup of text file stored in HDFS manually), is there any other difference (performance or otherwise) between using persist to cache the rdd to disk versus manually writing and reading from disk? Is there a reason to prefer one over the other?
More Context: I came across code which manually writes to HDFS and reads it back in our production application. I've just started learning Spark and was wondering if this can be replaced with persist(DISK_ONLY). Note that the saved rdd in HDFS is deleted before every new run of the job and this stored data is not used for anything else between the runs.