1
votes

I am using EMR 5.30.0 and trying to submit a Flink (1.10.0) job using the following command

flink run -m yarn-cluster /home/hadoop/flink--test-0.0.1-SNAPSHOT.jar

and i am getting the following error:

Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. 

After going through the logs on the worker nodes and job manager logs it looks like there is a port conflict

2020-06-17 21:40:51,199 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Could not start cluster entrypoint YarnJobClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
        at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
        at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
        at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
        ... 2 more
Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8081
        at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:219)
        at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)
        ... 9 more

There seems to be JIRA Ticket (https://issues.apache.org/jira/browse/FLINK-15394) open for this (though it is for 1.9 version of Flink) and the suggested solution is to use port range for rest.bind-port in Flink config File.

How ever in 1.10 version of Flink we only the following the the Yan Conf YML File

rest.port: 8081

Another issue i am facing is i have submitted multiple Flink jobs (same job multiple times) using AWS Console and via Add Step ui. Only one of the job succeeded and the rest have failed with the error posted above. And when i go to Flink UI it doesn't show any jobs at all.

Wondering whether each of the submitted jobs trying to create a Flink Yarn session instead of using the existing one.

Thanks Sateesh

1

1 Answers

1
votes

I am able to resolve it. There seems to be port conflict and i have to use a range of ports and commented out rest.port: 8081

#rest.port: 8081
rest.bind-port: 50100-50200

Thanks