0
votes

I am having a problem joining and debugging joining to Akka.NET cluster. I am using version 1.3.8. My setup is following:

Lighthouse

Almost default code from github. Runs in console akka.hocon is following:

lighthouse {
  actorsystem: "sng"
}

petabridge.cmd{
    host = "0.0.0.0"
    port = 9110
}

akka {
  loglevel = DEBUG
  loggers = ["Akka.Logger.Serilog.SerilogLogger, Akka.Logger.Serilog"]
  actor {
    provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
                  debug {
                  receive = on
                  autoreceive = on
                  lifecycle = on
                  event-stream = on
                  unhandled = on
              }
  }
  remote {
      log-sent-messages = on
      log-received-messages = on
      log-remote-lifecycle-events = on
        enabled-transports = ["akka.remote.dot-netty.tcp"]
    dot-netty.tcp {
      transport-class = "Akka.Remote.Transport.DotNetty.TcpTransport, Akka.Remote"
      applied-adapters = []
      transport-protocol = tcp
      hostname = "0.0.0.0"
      port = 4053
    }
    log-remote-lifecycle-events = DEBUG
  }            
  cluster {
    auto-down-unreachable-after = 5s
    seed-nodes = [] 
    roles = [lighthouse]
  }
}

Working node

Also console (net461) application with as simple as possible startup and joining. It works as excpected. akka.hocon:

akka {
  loglevel = DEBUG
  loggers = ["Akka.Logger.Serilog.SerilogLogger, Akka.Logger.Serilog"]
  actor {
    provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
  }

  remote {
      log-sent-messages = on
      log-received-messages = on
      log-remote-lifecycle-events = on
    dot-netty.tcp {
      transport-class = "Akka.Remote.Transport.DotNetty.TcpTransport, Akka.Remote"
      applied-adapters = []
      transport-protocol = tcp
      hostname = "0.0.0.0"
      port = 0
    }
  }            

  cluster {
    auto-down-unreachable-after = 5s
    seed-nodes = ["akka.tcp://[email protected]:4053"] 
    roles = [monitor]
  }
}

Not working node

An .NET 4.6.1 library, registerd as COM and started in other (Media Monkey) application with VBA code:

Sub OnStartup
   Set o = CreateObject("MediaMonkey.Akka.Agent.MediaMonkeyAkkaProxy")
   o.Init(SDB)
End Sub

Akka system is, as in console aplikation, created with standard ActorSystem.Create("sng", config);

akka.hocon:

akka {
  loglevel = DEBUG
  loggers = ["Akka.Logger.Serilog.SerilogLogger, Akka.Logger.Serilog"]
  actor {
    provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
  }
  remote {
      log-sent-messages = on
      log-received-messages = on
      log-remote-lifecycle-events = on
    dot-netty.tcp {
      transport-class = "Akka.Remote.Transport.DotNetty.TcpTransport, Akka.Remote"
      applied-adapters = []
      transport-protocol = tcp
      hostname = "0.0.0.0"
      port = 0
    }
  }            
  cluster {
    auto-down-unreachable-after = 5s
    seed-nodes = ["akka.tcp://[email protected]:4053"] 
    roles = [mediamonkey]
  }
}

Debugging workflow

  1. Startup Lighthouse application:

    Configuration Result: [Success] Name sng.Lighthouse [Success] ServiceName sng.Lighthouse Topshelf v4.0.0.0, .NET Framework v4.0.30319.42000 [Lighthouse] ActorSystem: sng; IP: 127.0.0.1; PORT: 4053 [Lighthouse] Performing pre-boot sanity check. Should be able to parse address [akka.tcp://[email protected]:4053] [Lighthouse] Parse successful. [21:01:35 INF] Starting remoting [21:01:35 INF] Remoting started; listening on addresses : [akka.tcp://[email protected]:4053] [21:01:35 INF] Remoting now listens on addresses: [akka.tcp://[email protected]:4053] [21:01:35 INF] Cluster Node [akka.tcp://[email protected]:4053] - Starting up... [21:01:35 INF] Cluster Node [akka.tcp://[email protected]:4053] - Started up successfully The sng.Lighthouse service is now running, press Control+C to exit. [21:01:35 INF] petabridge.cmd host bound to [0.0.0.0:9110] [21:01:35 INF] Node [akka.tcp://[email protected]:4053] is JOINING, roles [lighthouse] [21:01:35 INF] Leader is moving node [akka.tcp://[email protected]:4053] to [Up]

  2. Started and stopped working console node

Lighthouse logs:

[21:05:40 INF] Node [akka.tcp://[email protected]:37516] is JOINING, roles [monitor]
[21:05:40 INF] Leader is moving node [akka.tcp://[email protected]:37516] to [Up]
[21:05:54 INF] Connection was reset by the remote peer. Channel [[::ffff:127.0.0.1]:4053->[::ffff:127.0.0.1]:37517](Id=1293c63a)
[21:05:54 INF] Message AckIdleCheckTimer from akka://sng/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fsng%400.0.0.0%3A37516-1/endpointWriter to akka://sng/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fsng%400.0.0.0%3A37516-1/endpointWriter was not delivered. 1 dead letters encountered.
[21:05:55 INF] Message GossipStatus from akka://sng/system/cluster/core/daemon to akka://sng/deadLetters was not delivered. 2 dead letters encountered.
[21:05:55 INF] Message Heartbeat from akka://sng/system/cluster/core/daemon/heartbeatSender to akka://sng/deadLetters was not delivered. 3 dead letters encountered.
[21:05:56 INF] Message GossipStatus from akka://sng/system/cluster/core/daemon to akka://sng/deadLetters was not delivered. 4 dead letters encountered.
[21:05:56 INF] Message Heartbeat from akka://sng/system/cluster/core/daemon/heartbeatSender to akka://sng/deadLetters was not delivered. 5 dead letters encountered.
[21:05:57 INF] Message GossipStatus from akka://sng/system/cluster/core/daemon to akka://sng/deadLetters was not delivered. 6 dead letters encountered.
[21:05:57 INF] Message Heartbeat from akka://sng/system/cluster/core/daemon/heartbeatSender to akka://sng/deadLetters was not delivered. 7 dead letters encountered.
[21:05:58 INF] Message GossipStatus from akka://sng/system/cluster/core/daemon to akka://sng/deadLetters was not delivered. 8 dead letters encountered.
[21:05:58 INF] Message Heartbeat from akka://sng/system/cluster/core/daemon/heartbeatSender to akka://sng/deadLetters was not delivered. 9 dead letters encountered.
[21:05:59 WRN] Cluster Node [akka.tcp://[email protected]:4053] - Marking node(s) as UNREACHABLE [Member(address = akka.tcp://[email protected]:37516, Uid=1060233119 status = Up, role=[monitor], upNumber=2)]. Node roles [lighthouse]
[21:06:01 WRN] AssociationError [akka.tcp://[email protected]:4053] -> akka.tcp://[email protected]:37516: Error [Association failed with akka.tcp://[email protected]:37516] []
[21:06:01 WRN] Tried to associate with unreachable remote address [akka.tcp://[email protected]:37516]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [Association failed with akka.tcp://[email protected]:37516] Caused by: [System.AggregateException: One or more errors occurred. ---> Akka.Remote.Transport.InvalidAssociationException: No connection could be made because the target machine actively refused it tcp://[email protected]:37516
   at Akka.Remote.Transport.DotNetty.TcpTransport.<AssociateInternal>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Akka.Remote.Transport.DotNetty.DotNettyTransport.<Associate>d__22.MoveNext()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Akka.Remote.Transport.ProtocolStateActor.<>c.<InitializeFSM>b__11_54(Task`1 result)
   at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.Tasks.Task.Execute()
---> (Inner Exception #0) Akka.Remote.Transport.InvalidAssociationException: No connection could be made because the target machine actively refused it tcp://[email protected]:37516
   at Akka.Remote.Transport.DotNetty.TcpTransport.<AssociateInternal>d__1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Akka.Remote.Transport.DotNetty.DotNettyTransport.<Associate>d__22.MoveNext()<---
]
[21:06:04 INF] Cluster Node [akka.tcp://[email protected]:4053] - Leader is auto-downing unreachable node [akka.tcp://[email protected]:4053]
[21:06:04 INF] Marking unreachable node [akka.tcp://[email protected]:37516] as [Down]
[21:06:05 INF] Leader is removing unreachable node [akka.tcp://[email protected]:37516]
[21:06:05 WRN] Association to [akka.tcp://[email protected]:37516] having UID [1060233119] is irrecoverably failed. UID is now quarantined and all messages to this UID will be delivered to dead letters. Remote actorsystem must be restarted to recover from this situation.

Working node logs:

[21:05:38 INF] Starting remoting
[21:05:38 INF] Remoting started; listening on addresses : [akka.tcp://[email protected]:37516]
[21:05:38 INF] Remoting now listens on addresses: [akka.tcp://[email protected]:37516]
[21:05:38 INF] Cluster Node [akka.tcp://[email protected]:37516] - Starting up...
[21:05:38 INF] Cluster Node [akka.tcp://[email protected]:37516] - Started up successfully
[21:05:40 INF] Welcome from [akka.tcp://[email protected]:4053]
[21:05:40 INF] Member is Up: Member(address = akka.tcp://[email protected]:4053, Uid=439782041 status = Up, role=[lighthouse], upNumber=1)
[21:05:40 INF] Member is Up: Member(address = akka.tcp://[email protected]:37516, Uid=1060233119 status = Up, role=[monitor], upNumber=2)
//shutdown logs are missing
  1. Started and stopped COM node

Lighthouse logs:

[21:12:02 INF] Connection was reset by the remote peer. Channel [::ffff:127.0.0.1]:4053->[::ffff:127.0.0.1]:37546](Id=4ca91e15)

COM node logs:

[WARNING][18. 07. 2018 19:11:15][Thread 0001][ActorSystem(sng)] The type name for serializer 'hyperion' did not resolve to an actual Type: 'Akka.Serialization.HyperionSerializer, Akka.Serialization.Hyperion'
[WARNING][18. 07. 2018 19:11:15][Thread 0001][ActorSystem(sng)] Serialization binding to non existing serializer: 'hyperion'
[21:11:15 DBG] Logger log1-SerilogLogger [SerilogLogger] started
[21:11:15 DBG] StandardOutLogger being removed
[21:11:15 DBG] Default Loggers started
[21:11:15 INF] Starting remoting
[21:11:15 DBG] Starting prune timer for endpoint manager...
[21:11:15 INF] Remoting started; listening on addresses : [akka.tcp://[email protected]:37543]
[21:11:15 INF] Remoting now listens on addresses: [akka.tcp://[email protected]:37543]
[21:11:15 INF] Cluster Node [akka.tcp://[email protected]:37543] - Starting up...
[21:11:15 INF] Cluster Node [akka.tcp://[email protected]:37543] - Started up successfully
[21:11:15 DBG] [Uninitialized] Received Akka.Cluster.InternalClusterAction+Subscribe
[21:11:15 DBG] [Uninitialized] Received Akka.Cluster.InternalClusterAction+Subscribe
[21:11:16 DBG] [Uninitialized] Received Akka.Cluster.InternalClusterAction+JoinSeedNodes
[21:11:16 DBG] [Uninitialized] Received Akka.Cluster.InternalClusterAction+Subscribe
[21:11:26 WRN] Couldn't join seed nodes after [2] attempts, will try again. seed-nodes=[akka.tcp://[email protected]:4053]
[21:11:31 WRN] Couldn't join seed nodes after [3] attempts, will try again. seed-nodes=[akka.tcp://[email protected]:4053]
[21:11:36 WRN] Couldn't join seed nodes after [4] attempts, will try again. seed-nodes=[akka.tcp://[email protected]:4053]
[21:11:40 ERR] No response from remote. Handshake timed out or transport failure detector triggered.
[21:11:40 WRN] AssociationError [akka.tcp://[email protected]:37543] -> akka.tcp://[email protected]:4053: Error [Association failed with akka.tcp://[email protected]:4053] []
[21:11:40 WRN] Tried to associate with unreachable remote address [akka.tcp://[email protected]:4053]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [Association failed with akka.tcp://[email protected]:4053] Caused by: [Akka.Remote.Transport.AkkaProtocolException: No response from remote. Handshake timed out or transport failure detector triggered.
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Akka.Remote.Transport.AkkaProtocolTransport.<Associate>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Akka.Remote.EndpointWriter.<AssociateAsync>d__23.MoveNext()]
[21:11:40 DBG] Disassociated [akka.tcp://[email protected]:37543] -> akka.tcp://[email protected]:4053
[21:11:40 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 1 dead letters encountered.
[21:11:40 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 2 dead letters encountered.
[21:11:40 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 3 dead letters encountered.
[21:11:40 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 4 dead letters encountered.
[21:11:40 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 5 dead letters encountered.
[21:11:40 INF] Message AckIdleCheckTimer from akka://sng/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fsng%40127.0.0.1%3A4053-1/endpointWriter to akka://sng/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2Fsng%40127.0.0.1%3A4053-1/endpointWriter was not delivered. 6 dead letters encountered.
[21:11:41 WRN] Couldn't join seed nodes after [5] attempts, will try again. seed-nodes=[akka.tcp://[email protected]:4053]
[21:11:41 INF] Message InitJoin from akka://sng/system/cluster/core/daemon/joinSeedNodeProcess-1 to akka://sng/deadLetters was not delivered. 7 dead letters encountered.
[21:11:46 WRN] Couldn't join seed nodes after [6] attempts, will try again. seed-nodes=[akka.tcp://[email protected]:4053]
[21:11:51 WRN] Couldn't join seed nodes after [7] attempts, will try again. seed-nodes=[akka.tcp://[email protected]:4053]

Do you have any idea how to debug and/or resolve this?

1

1 Answers

0
votes

As I can see that the first thing I notice in the non-working node the hocon configuration contains different "seed-nodes" address from the working node.

IMHO the "seed-nodes" in all the applications [nodes as called in cluster] withinvthe cluster needs to be same. So in the non-working node instead of

seed-nodes = ["akka.tcp://[email protected]:4053"] 

replace with the below which is in the working node

seed-nodes = ["akka.tcp://[email protected]:4053"] 

Also, please check the github link for sample https://github.com/AJEETX/Akka.Cluster

and another link https://github.com/AJEETX/AkkaNet.Cluster.RoundRobinGroup

@Rok, Kindly let me know if this was helpful or I can further try to investigate.