3
votes

I am new to flink and trying to deploy this on EMR cluster. I have used 3 node cluster (1 master and 2 slaves) with their default configuration. I have not done any configuration changes and sticking with default configuration.

I am curious to understand the following points:

  1. How does master and slaves communicate with each other as I have not mentioned any IP in conf/slaves in master node?

  2. I can see a flink library in master node (Path: /usr/lib/flink) but cannot find flink library in slave nodes. How is my code getting executed on slave nodes?

  3. I will change some config according to my requirements in conf/flink-config.yml, if required. Do I need to make any other change on master or slave node apart from this?

1

1 Answers

2
votes

See the Running flink-crawler in EMR wiki page for details on how we run a Flink streaming job on top of EMR. Note that in this mode Flink is running via YARN, thus the Flink conf/slaves file isn't being used. You should also take a look at the YARN Setup documentation to better understand how Flink runs on top of YARN.