Lately I've been tuning the performance of some large, shuffle heavy jobs. Looking at the spark UI, I noticed an option called "Shuffle Read Blocked Time" under the additional metrics section.
This "Shuffle Read Blocked Time" seems to account for upwards of 50% of the task duration for a large swath of tasks.
While I can intuit some possibilities for what this means, I can't find any documentation that explains what it actually represents. Needless to say, I also haven't been able to find any resources on mitigation strategies.
Can anyone provide some insight into how I might reduce Shuffle Read Blocked Time?