4
votes

I Had SF cluster made of 3 Standard A0 nodes. I scaled cluster in to 1 node and understood that this was bad idea because nothing was working in this state (even SF explorer was not working) Then I scaled it out back to 3 nodes and restarted Primary scaleser. Now all nodes in scaleset are up and running but SF cluster status is "Upgrade service unreachable". I saw similar question Service Fabric Status: Upgrade service unreachable where was recommended to scale nodes up to D2 but this hasn't solve my problem. I have connected to one node via RDP and are some Event logs:

EventLog -> Applications and Service Logs -> Microsoft Service Fabric -> Operational:

Node name: _SSService_0 has failed to open with upgrade domain: 0, fault domain: fd:/0, address: 10.0.0.4, hostname: SSService000000, isSeedNode: true, versionInstance: 5.6.210.9494:3, id: d9e8bae2d4d8116bfefb989b95e91f7b, dca instance: 131405546580494698, error: FABRIC_E_TIMEOUT

EventLog -> Applications and Service Logs -> Microsoft Service Fabric -> Admin:

client-10.0.0.4:19000/10.0.0.4:19000: error = 2147943625, failureCount=487. Filter by (type~Transport.St && ~"(?i)10.0.0.4:19000") to get listener lifecycle. Connect failure is expected if listener was never started, or listener/its process was stopped before/during connecting.

2
From what I recall, only (without plenty of faffing around) solution is to destroy and recreate the cluster!Mardoxx
Running into the same error after scaling up. Recreating the cluster isn't enough, cause then the existing VMSS complains about a different cluster unique identifier mismatch. Gladly it is just a test environment cluster in azure.rfcdejong

2 Answers

1
votes

If you are scaling down the cluster by resizing VM scale set to 1 you're basically destroying the cluster because it requires a minimum of 3 nodes by design. Therefore the only way is to recreate it again from scratch.

If you need a tiny cluster consisting of just 1 node (like for testing purposes) there is a way in Azure now to create a single node cluster, but you won't be able to scale it as it's a special case not for production use.

1
votes

Upgrade service unreachable this happens if the number of active VM or node of the cluster become 0 anyhow. In my case, his happened by restarting all the VM at a time. In this state, the nodes are available and running but they have been disconnected from the cluster.

enter image description here

I resolved this, by deallocating and restarting the node from Virtual machine Scale set. enter image description here