0
votes

We run Flink in Kubernetes 1.8 in AWS. It's been fine for months. I've setup a new k8s clusters. Everything the same EXCEPT we enabled Calico (instead of using only Flannel)

Just like Flannel, Calico gives us networking between containers.

Since enabling Calico, Flink client receive this error when trying to send a jar file to job manager:

org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Could not upload the jar files to the job manager. Caused by: java.io.IOException: Could not retrieve the JobManager's blob port. Caused by: java.io.IOException: PUT operation failed: Connection reset Caused by: java.net.SocketException: Connection reset

and Job manager says:

java.lang.IllegalArgumentException: Invalid BLOB addressing for permanent BLOBs 2018-03-27 06:28:16,069 INFO org.apache.flink.runtime.jobmanager.JobManager - Submitting job 11433fc332c7d76100fd08e6d1b623b4 (flink-job-connectivity-test). 2018-03-27 06:28:16,085 INFO org.apache.flink.runtime.jobmanager.JobManager - Using restart strategy NoRestartStrategy for 11433fc332c7d76100fd08e6d1b623b4. 2018-03-27 06:28:16,096 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart 2018-03-27 06:28:16,105 INFO org.apache.flink.runtime.jobmanager.JobManager - Running initialization on master for job flink-job-connectivity-test (11433fc332c7d76100fd08e6d1b623b4). 2018-03-27 06:28:16,105 INFO org.apache.flink.runtime.jobmanager.JobManager - Successfully ran initialization on master in 0 ms. 2018-03-27 06:28:16,117 ERROR org.apache.flink.runtime.jobmanager.JobManager - Failed to submit job 11433fc332c7d76100fd08e6d1b623b4 (ignite-flink-job-connectivity-test) java.lang.NullPointerException at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:58) at org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.(CheckpointStatsTracker.java:121) at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:228) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1277) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:447) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)

It looks like the file cannot be transferred from the client to the job manager. I believe Invalid BLOB addressing is because the job manager did not receive any file.

Everything is the same. Works on one cluster. Does not work on another. Ports are configured the same. Every artefact is the same.

We don't have any NetworkPolicy. But would Calico enabled have some form of effect on networking?

1

1 Answers

0
votes

Problem solved. I added this to my Flink task manager manifest file

  • name: data port: 6121
  • name: rpc port: 6122
  • name: query port: 6125

And this in the flink conf files :

taskmanager.data.port: 6121

So basically I pinned a data port for task manager. I had done that for the job manager (blob server port). And it was fine. But it looks like Calico works differently than Flannel and it could not use a random data port for task manager