I am running Spark using Google Cloud dataproc cluster. Dataset writing to GCS stucks with pending 1 task which never ends

Question

I am running Spark using Google Cloud dataproc cluster. While writing Dataset to GCS bucket (Google cloud storage), it struck at last partition, which never ends.

It shows 799/800 tasks are completed. But the pending 1 task never ends.

Yayati Sule Yayati Sule · Accepted Answer · 2020-06-25T04:34:37

This occurs mainly due to Data Skew.

Also if you are trying out joins, then check if the columns being used for the join do not have Null values inside of them. This may be causing a Cross Join to happen for the Null Values

I am running Spark using Google Cloud dataproc cluster. Dataset writing to GCS stucks with pending 1 task which never ends

1 Answers