0
votes

The error is

'System.Replicator' reported Warning for property 'RemoteReplicatorConnectionStatus'.
Replica 132295460844367404 cannot be reached to start the copy process. Error Code: CannotConnect, Target listen address: localhost:62352/5298ce62-a8b6-4c10-944c-ce861fb5abd9-132295460844367404;70bcec58-3f57-4a23-b787-7353d53e631d:fdd277399fb82af80e7f8a0f097d244d. Verify that ReplicatorAddress config is valid.

There are 3 replicas, and 2 of them are stuck InBuild. The error reports as coming from the Primary replica, and the replicaId it complains about is of one of the secondary replicas that is stuck InBuild.

Everything I find on this error is related to standalone clusters, but my cluster is Azure generated. What are some causes of this error? It only happens for my stateful service when I deploy multiple replicas.

In the Primary replica events it shows the following error for each of the other 2 replicas

 "Description": "The api IReplicator.BuildReplica(132295460844367404) on node _default_4 is stuck. Start Time (UTC): 2020-03-24 17:55:24.215.",

If I set replica count to 1 the error doesn't appear, until I try to upgrade the application at which point it created a idle replica to swap and gets stuck on this error at that point causing the upgrade to hang indefinitely.

The same application can be deployed to my local 5 node cluster with no errors.

1

1 Answers

0
votes

I started commenting out code to see if I could get to a state where it was working, and eventually narrowed it down to the way I was overriding the replicator settings.

I was doing this

public MyStateFulService(StatefulServiceContext context)
: base(context, new ReliableStateManager(context, new ReliableStateManagerConfiguration(new ReliableStateManagerReplicatorSettings
{
MaxReplicationMessageSize = 1073741824
}))){ }

and changing it to

<Section Name="ReplicatorConfig">
    <Parameter Name="ReplicatorEndpoint" Value="ReplicatorEndpoint" />
    <Parameter Name="MaxReplicationMessageSize" Value="524288000" />
    <Parameter Name="MinLogSizeInMB" Value="4096" />
  </Section>

resolved the issue. I assume I was overriding the default replicator endpoint by creating a new ReliableStateManagerReplicatorSettings object