2
votes

I'm deploying my Flink app on AWS EMR 6.2

Flink version: 1.11.2

I configured the step with:

Arguments :"flink-yarn-session -d -n 2"

as described here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/flink-jobs.html

Both the cluster and the step are in state 'running'.

The controller logs shows:

INFO startExec 'hadoop jar /mnt/var/lib/hadoop/steps/s-309XLKSZGN4V4/trigger-service-***.jar flink-yarn-session -d -n 2'

However, the syslog show that the application didn't pick-up on the Yarn context.

2021-02-24 13:19:57,772 INFO org.apache.flink.runtime.minicluster.MiniCluster (main): Starting Flink Mini Cluster

I pulled out the class name returned by StreamExecutionEnvironment.getExecutionEnvironment in my app, and indeed it is LocalStreamEnvironment.

The application itself runs properly on the JobMaster instance as a local app.

1

1 Answers

1
votes

Turns out that the -n is deprecated in new versions of flink, so, per the advice from AWS support, I used this command instead:

flink-yarn-session -d -s 2  

Arguments meaning:

-d,--detached Start detached

-s,--slots Number of slots per TaskManager

In addition, In order to make Flink was required create two steps:

--steps '[{"Args":["flink-yarn-session","-d","-s","2"],"Type":"CUSTOM_JAR","MainClass":"","ActionOnFailure":"CANCEL_AND_WAIT","Jar":"command-runner.jar","Properties":"","Name":"Yarn Long Running Session"},{"Args":["bash","-c","aws s3 cp s3://<URL of Application Jar> .;/usr/bin/flink run -m yarn-cluster -p 2 -d <Application Jar>"],"Type":"CUSTOM_JAR","MainClass":"","ActionOnFailure":"CANCEL_AND_WAIT","Jar":"command-runner.jar","Properties":"","Name":"Flink Job"}]'

The first uses command-runner.jar to execute

flink-yarn-session -d -s 2

and the second uses command-runner.jarto download my application Jar from S3 and run the job with flink on yarn-cluster.