2
votes

I am running Spark 2.4.4. on YARN. The spark configuration on NodeManagers looks like this:

spark-defaults.conf:

spark.driver.port=38429
spark.blockManager.port=35430
spark.driver.blockManager.port=44349

When the Spark Driver and Executors are created they pick up the driver port (38429) config, but not the blockManager (35430) / driver.blockManager (44349) config. The blockManager ports are assigned randomly

Driver:

14:23:40 INFO spark.SparkContext: Running Spark version 2.4.4
14:23:40 INFO util.Utils: Successfully started service 'sparkDriver' on port **38429**.
14:23:41 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38171.
14:23:41 INFO netty.NettyBlockTransferService: Server created on driverhost:**38171**

Executor:

14:23:44 INFO client.TransportClientFactory: Successfully created connection to driverhost:**38429** after 73 ms (0 ms spent in bootstraps)
14:23:45 INFO executor.Executor: Starting executor ID 1 on host ...
14:23:45 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34914.
14:23:45 INFO netty.NettyBlockTransferService: Server created on executorhost:**34914**

I have come across a bug Jira describing this issue but it was raised against Spark 2.4.0 and closed 12 months ago: https://issues.apache.org/jira/browse/SPARK-27139

Looking at the Spark code in GitHub, I cant spot anything obvious:

https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/SparkEnv.scala

333    val blockManagerPort = if (isDriver) {
334      conf.get(DRIVER_BLOCK_MANAGER_PORT)
335    } else {
336      conf.get(BLOCK_MANAGER_PORT)
337    }
338
339    val blockTransferService =
340      new NettyBlockTransferService(conf, securityManager, bindAddress, advertiseAddress,
341        blockManagerPort, numUsableCores)

https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/internal/config/package.scala

308  private[spark] val BLOCK_MANAGER_PORT = ConfigBuilder("spark.blockManager.port")
309    .doc("Port to use for the block manager when a more specific setting is not provided.")
310    .intConf
311    .createWithDefault(0)
312
313  private[spark] val DRIVER_BLOCK_MANAGER_PORT = ConfigBuilder("spark.driver.blockManager.port")
314    .doc("Port to use for the block manager on the driver.")
315    .fallbackConf(BLOCK_MANAGER_PORT)

Can anyone tell me why my NettyBlockTransferService ports are being assigned randomly, and not 35430 or 44349 ?

1
Can you try passing these configs via spark-submit command line?mazaneicha
That does appear to have worked, thanks. It seems like a bug to me, unless there is something wrong with my configuration. But I can work around it with this approach INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44349. INFO netty.NettyBlockTransferService: Server created on driverhost:44349Phillip Pienaar
In addition, and I'm not sure why that is, Spark docs recommend to have key-values in spark-defaults.conf to be separated by whitespace, not equal sign. spark.apache.org/docs/latest/…mazaneicha

1 Answers

0
votes

The problem here was setting this config on the YARN NodeManagers. It needs to be set on the client i.e. the process that submits the Spark app, not on the cluster itself.