4
votes

After reading some online forums and stack overflow questions, what I understood is:

Spilling of data happens when an executor runs out of its memory. And shuffle spill (memory) is the size of the deserialized form of the data in memory at the time when we spill it.

I am running spark locally, and I set the spark driver memory to 10g.

If my understanding is correct, then if a groupBy operation needs more than 10GB execution memory it has to spill the data to the disk.

Let suppose a groupBy operation needs 12GB memory, as driver memory is set to 10GB it has to spill nearly 2GB data to disk, so Shuffle Spill (Disk) should be 2GB and Shuffle spill (memory) should be reaming which is 10GB, because shuffle spill (memory) is size of data in memory at the time of spill.

If my understanding is correct then Shuffle spill (memory) <= Executor memory. In my case it is driver memory as I am running spark locally.

But it seems like I am missing something, below is the values from spark ui.

    Total Time Across All Tasks: 41 min
    Locality Level Summary: Process local: 45
    Input Size / Records: 1428.1 MB / 42783987
    Shuffle Write: 3.8 GB / 23391365
    Shuffle Spill (Memory): 26.7 GB
    Shuffle Spill (Disk): 2.1 GB

Even though I set the spark driver memory to 10g, how could be memory shuffle spill more than memory allocated to driver.

I observed the memory consumption in windows task manger it never exceeded 10.5GB while job was running, then how could possibly Shuffle Spill (Memory) is 26.7 GB.

DAG:

DAG Visualization

Event timeline: 45 Tasks because of 45 partitions for 4.25GB data.

Event Timeline

Here is the code that I am trying to run which is solution for my previous problem.

1

1 Answers

1
votes

I observed the memory consumption in windows task manger it never exceeded 10.5GB while job was running, then how could possibly Shuffle Spill (Memory) is 26.7 GB.

That's because the metrics is aggregated across whole task. So if spill occurred three times, each time with 10GB in memory, total could be even 30GB.