1
votes

My problem is for the connection between slaves from other node to the master. I have 3 nodes setup as follow :

  • 1 node with the master and 1 worker launched on the same docker
  • 2 node with 1 worker each on docker

The docker-compose open theses ports :

version: '2'
services:
  spark:
    image: xxxxxxxx/spark
    tty: true
    stdin_open: true
    container_name: spark
    volumes:
     - /var/data/dockerSpark/:/var/data
ports:
 - "7077:7077"
 - "127.0.0.1:8080:8080"
 - "7078:7078"
 - "127.0.0.1:8081:8081"
 - "127.0.0.1:9010:9010"
 - "4040:4040"
 - "18080:18080"
 - "6066:6066"
 - "9000:9000"

The conf/spark-env.sh is as follow :

 #export STANDALONE_SPARK_MASTER_HOST=172.xx.xx.xx #This is the docker Ip adress on the node
 #export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST
 export SPARK_WORKER_MEMORY=7g
 export SPARK_EXECUTOR_MEMORY=6G
 export SPARK_WORKER_CORES=4
 export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=86400 -Dspark.worker.cleanup.appDataTtl=86400"

My problem is for the connection between slaves from other node to the master, so i begin by starting master sbin/start-master.sh. During my first attempts the 2 first lines was commented and the master started at this adress spark://c96____37fb:7077. I connected succesfully nodes using theses commands :

  • sbin/start-slave.sh spark://c96____37fb:7077 --port 7078 for the collocated slave
  • sbin/start-slave.sh spark://masterNodeIP:7077 --port 7078 for the two others slaves

All the port cited previously are redirected from nodeMaster to the corresponding docker.

So the webUI show me that my cluster had 3 connected nodes, unfortunately when it comes to run, only the collocated nodes was working, the two others continuously disconnect and reconnect to the application without doing anything.

Next i tried to change STANDALONE_SPARK_MASTER_HOST=172.xx.xx.xx to the value of 1 the nodeMasterIP but the master doesn't started and 2 by the 172.xxx address which is the docker ip adress inside masterNode. The 2nd attempt works and the webUi shows me the follow adress spark://172.xx.xx.xx:7077. Then the slaves connected succesfully but again the two external slaves do not show any sign of activity.

Edit

Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers gives me a part of the answear but not the one i want because by adding network_mode: "host" to the docker-compose.yml i succeed to build my cluster at STANDALONE_SPARK_MASTER_HOST=ipNodeMaster and connect slaves to it. Execution was OK but stopped at a collect operation with this error org.apache.spark.shuffle.FetchFailedException: Failed to connect to xxx/yy.yy.yy.yy:36801 which seems to be a port issue.

But my real concern is that i don't want to run the spark master docker on the localhost of the masterNode but on its own docker network ("bridge").

Thank you for your wises advices !

1
For your exact problem I have no idea, but one advice is that you do not need to expose all those ports in the docker compose config, they will be available to all other members of the docker compose subnetwork , if you do not need to reach any of those ports externally from the docker host then you can remove themsam

1 Answers

0
votes

I tried it the other way by triggering the spark-class in master and slave VM's container

  version: "2"
   services:
    spark-master:
     image: spark/mastertool:2.2.1
      command: /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master
      hostname: spark-master

      environment:
        MASTER: spark://localhost:port
        SPARK_CONF_DIR: /opt/conf
        SPARK_PUBLIC_DNS: xxx.xxx.xxx.xxx

      ports:
       - 8080:8080
       - 7077:7077
       - 6066:6066
       - 4040:4040

The above is docker-compose for master vm

The below is docker-compose for slave vm

version: "2"

services:
 spark-worker:
  image: spark/workertool:2.2.1
   command: /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://private-ip-of-master_cluster:xxxx
   hostname: spark-worker
   environment:
     SPARK_CONF_DIR: /conf
     SPARK_PUBLIC_DNS: xx.xxx.xx.xxx
     SPARK_WORKER_CORES: 2
     SPARK_WORKER_MEMORY: 2g
     SPARK_WORKER_PORT: xxxx
     SPARK_WORKER_WEBUI_PORT: xxxx

   ports:
     - 8xxx:xxxx