My problem is for the connection between slaves from other node to the master. I have 3 nodes setup as follow :
- 1 node with the master and 1 worker launched on the same docker
- 2 node with 1 worker each on docker
The docker-compose open theses ports :
version: '2'
services:
spark:
image: xxxxxxxx/spark
tty: true
stdin_open: true
container_name: spark
volumes:
- /var/data/dockerSpark/:/var/data
ports:
- "7077:7077"
- "127.0.0.1:8080:8080"
- "7078:7078"
- "127.0.0.1:8081:8081"
- "127.0.0.1:9010:9010"
- "4040:4040"
- "18080:18080"
- "6066:6066"
- "9000:9000"
The conf/spark-env.sh is as follow :
#export STANDALONE_SPARK_MASTER_HOST=172.xx.xx.xx #This is the docker Ip adress on the node
#export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST
export SPARK_WORKER_MEMORY=7g
export SPARK_EXECUTOR_MEMORY=6G
export SPARK_WORKER_CORES=4
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=86400 -Dspark.worker.cleanup.appDataTtl=86400"
My problem is for the connection between slaves from other node to the master, so i begin by starting master sbin/start-master.sh. During my first attempts the 2 first lines was commented and the master started at this adress spark://c96____37fb:7077. I connected succesfully nodes using theses commands :
- sbin/start-slave.sh spark://c96____37fb:7077 --port 7078 for the collocated slave
- sbin/start-slave.sh spark://masterNodeIP:7077 --port 7078 for the two others slaves
All the port cited previously are redirected from nodeMaster to the corresponding docker.
So the webUI show me that my cluster had 3 connected nodes, unfortunately when it comes to run, only the collocated nodes was working, the two others continuously disconnect and reconnect to the application without doing anything.
Next i tried to change STANDALONE_SPARK_MASTER_HOST=172.xx.xx.xx to the value of 1 the nodeMasterIP but the master doesn't started and 2 by the 172.xxx address which is the docker ip adress inside masterNode. The 2nd attempt works and the webUi shows me the follow adress spark://172.xx.xx.xx:7077. Then the slaves connected succesfully but again the two external slaves do not show any sign of activity.
Edit
Spark SPARK_PUBLIC_DNS and SPARK_LOCAL_IP on stand-alone cluster with docker containers gives me a part of the answear but not the one i want because by adding network_mode: "host" to the docker-compose.yml i succeed to build my cluster at STANDALONE_SPARK_MASTER_HOST=ipNodeMaster and connect slaves to it. Execution was OK but stopped at a collect operation with this error org.apache.spark.shuffle.FetchFailedException: Failed to connect to xxx/yy.yy.yy.yy:36801 which seems to be a port issue.
But my real concern is that i don't want to run the spark master docker on the localhost of the masterNode but on its own docker network ("bridge").
Thank you for your wises advices !