0
votes

I have a small Hadoop/Yarn cluster that is running on a system with a firewall that must be enabled. We are trying to submit Spark jobs that fail because of the port allocations.

I've configured the firewall for all the standard Hadoop/Yarn/Spark ports that need to be opened, as well as set what I thought was all the configurations to restrict port ranges. But the application manager still creates containers on random ports that get blocked.

The one setting I thought would do the trick was yarn.app.mapreduce.am.job.client.port-range set in mapred-site.xml, but is doesn't seem to be respected or making difference.

Any thoughts/help would be greatly appreciated. Banged my head on the wall way too long on this one.

Edit Forgot versions - Hadoop/Yarn 2.8.0, Spark 2.1.0, CentOS7

1
Have you though about enabling all communications in the iptables to determined subnet? (the one where the cluster nodes are running) - Serhiy
That wasn't our first choice, but it's looking like our only one. Implementing that now. - ksdaly

1 Answers

0
votes

yarn.app.mapreduce.am.job.client.port-range only applies to MapReduce applications running on Yarn

One can configure the port range for Spark applications on Yarn by configuring spark.driver.port and spark.port.maxRetries in spark-defaults.conf. The following values should configure the application master to use ports 50100-50200:

spark.driver.port 50100
spark.port.maxRetries 99