1
votes

I have a cluster with:

  • 1 TaskManager
  • 1 StandaloneJob / JobManager
  • Config: taskmanager.numberOfTaskSlots: 1

If I set default.parallelism: 4 on a job with the Flink PubSub source, I keep getting this error when starting my "job cluster"/taskmanager:

[analytics-job-cluster-7bd4586ccb-s5hmp job] 2019-05-01 16:22:30,888 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: Custom Source -> Process -> Timestamps/Watermarks -> app_events (1/4) of job 00000000000000000000000000000000 is not in state RUNNING but SCHEDULED instead. Aborting checkpoint.

However, if I point the same job at a bunch of files, it works perfectly. What does this mean?

1
Hey, can You provide a little more logs ? Also, what is the exact configuration of Your cluster ? How many taskmanagers/job managers/ task slots do You have ?Dominik Wosiński
@DominikWosiński I've updated the question with some more config I have. I was under the impression that you only needed one slot for one taskmanager and you could let parallelisable items (such as (:Stream).keyBy) parallelise themselves?Henrik

1 Answers

1
votes

So, the issue is that You need the numberOfTaskSlots equal to Your parallelism basically. So in this case If You have only 1 TaskManager with only 1 TaskSlot Flink will not be able to start the job properly as there is simply not enough slots for it. If You set the numberOfTaskSlots for the given TaskManager equal to the parallelism, then it should work well.