I'm getting confused about spill to disk
and shuffle write
. Using the default Sort shuffle manager, we use an appendOnlyMap
for aggregating and combine partition records, right? Then when execution memory fill up, we start sorting map, spilling it to disk and then clean up the map for the next spill(if occur), my questions are :
What is the difference between spill to disk and shuffle write? They consist basically in creating file on local file system and also record.
Admit are different, so Spill records are sorted because the are passed through the map, instead shuffle write records no because they don't pass from the map.
- I have the idea that the total size of the spilled file, should be equal to the size of the Shuffle write, maybe I'm missing something, please help to understand that phase.
Thanks.
Giorgio