So I've installed an apache flink cluster on our network. I've done the configurations as illustrated below. This Master (JobManager) starts, and sends the start command to all the slaves via ssh. I can see that the task managers are running after they were started by the master node.
Config file on all nodes:
jobmanager.rpc.address: flmaster
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 2048
taskmanager.numberOfTaskSlots: 1
taskmanager.memory.preallocate: false
parallelism.default: 1
jobmanager.web.port: 8081
taskmanager.tmp.dirs: /apps/storage/runtime/flink/workspace
recovery.mode: zookeeper
recovery.zookeeper.quorum:zk1:2181, zk2:2181, zk3:2181
recovery.zookeeper.storageDir: /apps/runtime/flink/recovery
env.java.home: /apps/java/
Then i have a file called slaves in the config folder with a list of the slaves nodes.
flSlave1
flSlave2
flSlave3
I then start it
../bin/start-cluster.sh
This opens an ssh session to all the slave nodes, and starts the task manager. I can see this with ps ax | grep java
I can open the Web-Ui on flMaster:8081 On the WebUI I can see the slave node count is 0. I have no task managers. As a test, I started the wordcount.jar job, and it tells me it cannot run the job since there are no slots open.
/apps/flink/bin/flink run /apps/flink/examples/batch/WordCount.jar
the response:
07/20/2016 13:19:01 Job execution switched to status FAILING.
org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job.*
Well I guess if there is no task managers/slave nodes, there will be no slots.
Any one ever seen this issue?