We created an Akka Cluster infrastructure for Sms, Email and Push notifications. 3 different kind of nodes are exist in the system, which are client, sender and lighthouse. Client role is being used by Web application and API application(Web and API is hosted at IIS). Lighthouse and Sender roles are being hosted as a Windows service. By taking consideration that Web app and API app AppPools recycles because of IIS, in global.asax.cs's Start and Stop event, we shutdown actor system in Client roles and start again. We can observe through the logs that system succesfully shutdowns and joins the Cluster.
But sometimes, when AppPool recycles, client ActorSystem starts but can't join the Cluster and our Notification's stops working(which is a huge problem for us). When we manually shotdowns ActorSystem and make it work again manually, it joins the Cluster. This situation happens approximately every two days.
We can observe that Client joins the Cluster before the Error;
Node [akka.tcp://NotificationSystem@...:41350] is JOINING, roles [client]
Leader is moving node [akka.tcp://NotificationSystem@...:41350] to [Up]
By looking at the logs, we can see following error after client joins the cluster;
Shut down address: akka.tcp://NotificationSystem@...:41350Akka.Remote.ShutDownAssociation: Shut down address: akka.tcp://NotificationSystem@...:41350 ---> Akka.Remote.Transport.InvalidAssociationException: The remote system terminated the association because it is shutting down. --- End of inner exception stack trace --- at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level) at Akka.Remote.EndpointWriter.b__20_0(Exception ex) at Akka.Actor.LocalOnlyDecider.Decide(Exception cause) at Akka.Actor.OneForOneStrategy.Handle(IActorRef child, Exception x) at Akka.Actor.SupervisorStrategy.HandleFailure(ActorCell actorCell, Exception cause, ChildRestartStats failedChildStats, IReadOnlyCollection1 allChildren) at Akka.Actor.ActorCell.HandleFailed(Failed f) at Akka.Actor.ActorCell.SystemInvoke(Envelope envelope)--- End of stack trace from previous location where exception was thrown --- at Akka.Actor.ActorCell.HandleFailed(Failed f) at Akka.Actor.ActorCell.SystemInvoke(Envelope envelope)Akka.Remote.ShutDownAssociation: Shut down address: akka.tcp://NotificationSystem@...:41350 ---> Akka.Remote.Transport.InvalidAssociationException: The remote system terminated the association because it is shutting down. --- End of inner exception stack trace --- at Akka.Remote.EndpointWriter.PublishAndThrow(Exception reason, LogLevel level) at Akka.Remote.EndpointWriter.b__20_0(Exception ex) at Akka.Actor.LocalOnlyDecider.Decide(Exception cause) at Akka.Actor.OneForOneStrategy.Handle(IActorRef child, Exception x) at Akka.Actor.SupervisorStrategy.HandleFailure(ActorCell actorCell, Exception cause, ChildRestartStats failedChildStats, IReadOnlyCollection`1 allChildren) at Akka.Actor.ActorCell.HandleFailed(Failed f) at Akka.Actor.ActorCell.SystemInvoke(Envelope envelope)--- End of stack trace from previous location where exception was thrown --- at Akka.Actor.ActorCell.HandleFailed(Failed f) at Akka.Actor.ActorCell.SystemInvoke(Envelope envelope)
After error, we see that following error message;
Association to [akka.tcp://NotificationSystem@...:41350] having UID [226948907] is irrecoverably failed. UID is now quarantined and all messages to this UID will be delivered to dead letters. Remote actorsystem must be restarted to recover from this situation.
Without restarting the client actor, the system doesn't correct itself.
Our Client Role configuration is;
<akka>
<hocon>
<![CDATA[
akka{
loglevel = DEBUG
actor{
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
deployment {
/coordinatorRouter {
router = round-robin-group
routees.paths = ["/user/NotificationCoordinator"]
cluster {
enabled = on
max-nr-of-instances-per-node = 1
allow-local-routees = off
use-role = sender
}
}
}
serializers {
wire = "Akka.Serialization.WireSerializer, Akka.Serialization.Wire"
}
serialization-bindings {
"System.Object" = wire
}
debug{
receive = on
autoreceive = on
lifecycle = on
event-stream = on
unhandled = on
}
}
remote {
helios.tcp {
transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
applied-adapters = []
transport-protocol = tcp
hostname = "***.***.**.**"
port = 0
}
}
cluster {
seed-nodes = ["akka.tcp://NotificationSystem@***.***.**.**:5053", "akka.tcp://NotificationSystem@***.***.**.**:5073"]
roles = [client]
}
}
]]>
</hocon>
Our Sender Role configuration is;
<akka>
<hocon><![CDATA[
akka{
loglevel = INFO
loggers = ["Akka.Logger.NLog.NLogLogger, Akka.Logger.NLog"]
actor{
debug {
# receive = on
# autoreceive = on
# lifecycle = on
# event-stream = on
# unhandled = on
}
provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
serializers {
wire = "Akka.Serialization.WireSerializer, Akka.Serialization.Wire"
}
serialization-bindings {
"System.Object" = wire
}
deployment{
/NotificationCoordinator/ApplePushNotificationActor{
router = round-robin-pool
resizer{
enabled = on
lower-bound = 3
upper-bound = 5
}
}
/NotificationCoordinator/AndroidPushNotificationActor{
router = round-robin-pool
resizer{
enabled = on
lower-bound = 3
upper-bound = 5
}
}
/NotificationCoordinator/EmailActor{
router = round-robin-pool
resizer{
enabled = on
lower-bound = 3
upper-bound = 5
}
}
/NotificationCoordinator/SmsActor{
router = round-robin-pool
resizer{
enabled = on
lower-bound = 3
upper-bound = 5
}
}
/NotificationCoordinator/LoggingCoordinator/ResponseLoggerActor{
router = round-robin-pool
resizer{
enabled = on
lower-bound = 3
upper-bound = 5
}
}
}
}
remote{
log-remote-lifecycle-events = DEBUG
log-received-messages = on
helios.tcp{
transport-class = "Akka.Remote.Transport.Helios.HeliosTcpTransport, Akka.Remote"
applied-adapters = []
transport-protocol = tcp
#will be populated with a dynamic host-name at runtime if left uncommented
#public-hostname = "POPULATE STATIC IP HERE"
hostname = "***.***.**.**"
port = 0
}
}
cluster {
seed-nodes = ["akka.tcp://NotificationSystem@***.***.**.**:5053", "akka.tcp://NotificationSystem@***.***.**.**:5073"]
roles = [sender]
}
}
]]></hocon>
How can we solve this problem? Thank you.