The code saves sqlcontext dataframe result as parquet files to S3 folder. This spark job cause lot of connection to be opened. Though the spark job has finished there are lot of CLOSE_WAIT connections in AWS EMR . Have used spark.close, sc.close yet the connections are in CLOSE_WAIT state for port no:4040.
2 Answers
0
votes
0
votes
We found the fix, in our code we just used disable spark.ui.enabled parameter. By default it is set to 'True'.In order to not to open connections we need to set to 'False'.
The issue we faced because of too many connections being opened affected our EMR query performance and over the time it affected our spark performance too.