I am trying to understand if spark is an alternative to the vanilla MapReduce approach for analysis of BigData. Since spark saves the operations on the data in the memory so while using the HDFS as storage system for spark , does it take the advantage of distributed storage of the HDFS? For instance suppose i have 100GB CSV file stored in HDFS, now i want to do analysis on it. If i load that from HDFS to spark , will spark load the complete data in-memory to do the transformations or it will use the distributed environment for doing its jobs that HDFS provides for Storage which is leveraged by the MapReduce programs written in hadoop. If not then what is the advantage of using spark over HDFS ?
PS: I know spark spills on the disks if there is RAM overflow but does this spill occur for data per node(suppose 5 GB per node) of the cluster or for the complete data(100GB)?