I am running Spark using Google Cloud dataproc cluster. While writing Dataset to GCS bucket (Google cloud storage), it struck at last partition, which never ends.
It shows 799/800 tasks are completed. But the pending 1 task never ends.
Also if you are trying out joins, then check if the columns being used for the join do not have Null values inside of them. This may be causing a Cross Join to happen for the Null Values
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OkRead more