I launch a EMR cluster with the following specs :
- 1 master node m4.4xlarge with EBS Storage 32 GB
- 10 core nodes m4.4xlarge with EBS Storage 1024 GB
- Auto termination after last job completion
A Spark job is associated. It reads data from S3 and save output data in S3.
After several attempts, it appears that each time, the Spark job terminates in about 1 hour and 15 minutes (I can see the jobs completed in Spark Web UI and I can see the output in S3 which is good). But the EMR cluster hangs between 20 to 30 minutes before shutting down. So, overall, it takes 1 hour and 45 minutes.
Why EMR cluster takes so much time to terminate after the last job completion ?
sys.exit(0)at the end of your code to force termination. - Glennie Helles Sindholt