1
votes

I have a small cluster setup for my development purpose, which contains 3 VMs with spark 2.3 installed on all the VMs. I have started the master in VM1 and slaves with master Ipaddress in other 2 Vms. we have Firewall up in all the 3 Vms and opened the port range from 38001:38113 in the firewall

Before starting the VMs we have the following configurations Done.

In Master, Worker 1 & Worker 2 Nodes

Spark-default.conf file was added with the following properties:

  • spark.blockManager.port 38001
  • spark.broadcast.port 38018
  • spark.driver.port 38035
  • spark.executor.port 38052
  • spark.fileserver.port 38069
  • spark.replClassServer.port 38086
  • spark.shuffle.service.port 38103

In Worker 1 & Worker 2 Nodes

Spark-env.sh file was added with the following properties:

  • SPARK_WORKER_PORT=38112 -- for worker-1
  • SPARK_WORKER_PORT=38113 -- for worker-2

When we started the Spark-shell and executed a sample csv file read, the executor started on the Worker is starting with a random port for spark driver.

E.g:

Spark Executor Command: "/usr/java/jdk1.8.0_171-amd64/jre/bin/java" "-cp" "/opt/spark/2.3.0/conf/:/opt/spark/2.3.0/jars/*" "-Xmx1024M" "-Dspark.driver.port=34573" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://[email protected]:34573" "--executor-id" "1" "--hostname" "293.72.146.384" "--cores" "4" "--app-id" "app-20180706072052-0000" "--worker-url" "spark://[email protected]:38112"

As you can see in the above command the executor started with Spark.driver.port with 34573. And this is always starting randomly. Because of this my program fails as it is unable to communicate with the driver.

Can anyone help me with this configuration which can be used to execute in network tight environment where All the ports are blocked.

Thanks in advance.

1
Where did you pick that list of properties? Most are deprecated since Spark 1.6 or Spark 2.0... Cf. stackoverflow.com/questions/27729010/… - Samson Scharfrichter

1 Answers

0
votes

Start worker:

./start-slave.sh spark://hostname:port -p [Worker Port]

Options: -c CORES, --cores CORES Number of cores to use -m MEM, --memory MEM Amount of memory to use (e.g. 1000M, 2G) -d DIR, --work-dir DIR Directory to run apps in (default: SPARK_HOME/work) -i HOST, --ip IP Hostname to listen on (deprecated, please use --host or -h) -h HOST, --host HOST Hostname to listen on -p PORT, --port PORT Port to listen on (default: random) --webui-port PORT Port for web UI (default: 8081) --properties-file FILE Path to a custom Spark properties file. Default is conf/spark-defaults.conf.