We have a standalone Spark cluster. With a cluster, if the RDD memory storage is not enough, it spills the data to disk. Where exactly is the data spilled to when there is no HDFS? Local disk of each slave node?
As far as I know all data is spilled to the local directory defined by spark.local.dir independent of HDFS access.
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OkRead more