0
votes

The code saves sqlcontext dataframe result as parquet files to S3 folder. This spark job cause lot of connection to be opened. Though the spark job has finished there are lot of CLOSE_WAIT connections in AWS EMR . Have used spark.close, sc.close yet the connections are in CLOSE_WAIT state for port no:4040.

2

2 Answers

0
votes

4040 is default Spark UI (Driver) port. I don't think 4040 is something to do with the job. But if you have done spark.close, sc.close, these ports should be closed(or you can close the app and process will release all ports).

0
votes

We found the fix, in our code we just used disable spark.ui.enabled parameter. By default it is set to 'True'.In order to not to open connections we need to set to 'False'.

The issue we faced because of too many connections being opened affected our EMR query performance and over the time it affected our spark performance too.