1
votes

I'm plan to do an upgrading from Flink 1.5.2 to 1.6.0, and then do the jobs migration. In order to minimum the pause time for the jobs, I plan to run both Flink clusters at the same time, after migrating jobs successfully, I would stop the old one. However when I tried to stop the Flink cluster by running stop-cluster.sh in the directory Flink1.5.2/bin , I found the stopped cluster is Flink 1.6.0 instead of the expected Flink 1.5.2 .

I did some test and found the stop-cluster.sh just stop the latest started Flink cluster, that is to say, if you start cluster 1.6.0 firstly, then starts Flink 1.5.2, after that when you run stop-cluster.sh, it would stop Flink 1.5.2 firstly even you run the stop-cluster.sh at the cluster 1.6.0 directory Flink1.6.0/bin. Based on my understanding, when running the stop-cluster.sh at the Flink1.6.0/bin it should stop cluster 1.6.0, and stop the cluster 1.5.2 when running the stop-cluster.sh at the Flink1.5.2/bin , however it didn't.

I did some research and found the stop-cluster.sh would kill the process based on the file which contains the pid , however I don't know the location of that file, and I suspect both of the clusters write the pid in the same places when they started, which make the stop-cluster.sh chaotic.

Please advise how to stop the specified cluster.

1

1 Answers

3
votes

Per default, the pid file is written to /tmp and has the name flink-<USER>-<FLINK_COMPONENT>.pid. You can control the directory by setting the env.pid.dir configuration in flink-conf.yaml. By using different pid file directory you can keep control over the different clusters.