Akka Cluster manual join

Question

I'm trying to find a workaround to the following limitation: When starting an Akka Cluster from scratch, one has to make sure that the first seed node is started. It's a problem to me, because if I have an emergency to restart all my system from scratch, who knows if the one machine everything relies on will be up and running properly? And I might not have the luxury to take time changing the system configuration. Hence my attempt to create the cluster manually, without relying on a static seed node list.

Now it's easy for me to have all Akka systems registering themselves somewhere (e.g. a network filesystem, by touching a file periodically). Therefore when starting up a new system could

Look up the list of all systems that are supposedly alive (i.e. who touched the file system recently).
a. If there is none, then the new system joins itself, i.e. starts the cluster alone. b. Otherwise it tries to join the cluster with Cluster(system).joinSeedNodes using all the other supposedly alive systems as seeds.
If 2. b. doesn't succeed in reasonable time, the new system tries again, starting from 1. (looking up again the list of supposedly alive systems, as it might have changed in the meantime; in particular all other systems might have died and we'd ultimately fall into 2. a.).

I'm unsure how to implement 3.: How do I know whether joining has succeeded or failed? (Need to subscribe to cluster events?) And is it possible in case of failure to call Cluster(system).joinSeedNodes again? The official documentation is not very explicit on this point and I'm not 100% how to interpret the following in my case (can I do several attempts, using different seeds?):

An actor system can only join a cluster once. Additional attempts will be ignored. When it has successfully joined it must be restarted to be able to join another cluster or to join the same cluster again.

Finally, let me precise that I'm building a small cluster (it's just 10 systems for the moment and it won't grow very big) and it has to be restarted from scratch now and then (I cannot assume the cluster will be alive forever).

Thx

Nad78 Nad78 · Accepted Answer · 2017-07-17T09:33:12

I'm answering my own question to let people know how I sorted out my issues in the end. Michal Borowiecki's answer mentioned the ConstructR project and I built my answer on their code.

How do I know whether joining has succeeded or failed? After issuing Cluster(system).joinSeedNodes I subscribe to cluster events and start a timeout:

private case object JoinTimeout
...
Cluster(context.system).subscribe(self, InitialStateAsEvents, classOf[MemberUp], classOf[MemberLeft])
system.scheduler.scheduleOnce(15.seconds, self, JoinTimeout)

The receive is:

val address = Cluster(system).selfAddress
...
case MemberUp(member) if member.address == address =>
  // Hooray, I joined the cluster!
case JoinTimeout =>
  // Oops, couldn't join
  system.terminate()

Is it possible in case of failure to call Cluster(system).joinSeedNodes again? Maybe, maybe not. But actually I simply terminate the actor system if joining didn't succeed and restart it for another try (so it's a "let it crash" pattern at the actor system level).

Akka Cluster manual join

3 Answers