1
votes

I am running Spark using Google Cloud dataproc cluster. While writing Dataset to GCS bucket (Google cloud storage), it struck at last partition, which never ends.

It shows 799/800 tasks are completed. But the pending 1 task never ends.

1

1 Answers

0
votes

This occurs mainly due to Data Skew.

Also if you are trying out joins, then check if the columns being used for the join do not have Null values inside of them. This may be causing a Cross Join to happen for the Null Values