Drilling down into one partition folder, I would expect to see just
one replica id
If things have been running for a while, I'd expect to see more than one. This is because of two things:
- Service Fabric keeps the information for replicas which failed around on the nodes for at least the
ReplicaRestartWaitDuration
. This is so that if local recovery is possible, there's still the information necessary on the node. If the replica just failed and can't be cleanly dropped for example, these sorts of files can accumulate. They can also be present if someone "ForceRemoved" individual replicas since that explicitly skips clean shutdown. This is part of why we generally don't recommend using this command in production environments.
There's also a setting known as the "UserStandbyReplicaKeepDuration" which governs how long SF keeps old replicas around that are not needed right now, in case they are needed later (because it's usually cheaper to rebuild from partial state than full state).
a. For example, say a node some replica was on failed and stayed down longer than the ReplicaRestartWaitDuration
for that service. When this happens SF builds a replacement replica to get you back up to your TargetReplicaSetSize
.
b. Let's say that once that replica is built the node that failed comes back.
c. If we're still within the StandbyReplicaKeepDuration for that replica, then SF will just leave it there on disk. If there's another failure in the meantime, SF will usually (depends on the Cluster Resource Manager settings, whether this node is a valid target etc.) pick this partial replica and rebuild the replacement from what remains on the drive.
So you can see replicas from the past whose information is still being kept on the drives, but you generally shouldn't see anything older than the UserStandbyReplicaKeepDuration
(by default a week). You can always reduce that duration in your cluster if you want.
I would expect the folder to contain not much more than the size of
8000 actors taking up around 150 Kb each so around 1.14 Gb of data.
Not as expected:The folder contains a file ActorStateStore and its
size is 5.66Gb
This is a bit more puzzling. Let't go back to the amount of stuff we expect to be on a given node. You say you have 80K actors. I presume you have a TargetReplicaSetSize
of 3, so that's really more like 240K actors. Each actor is ~150K of state, so that's ~34 GB of state for the cluster. Per node then we'd expect 3.4 GB of state. (I think your original estimate forgot replication. If you've actually got a TargetReplicaSetSize
of 1, then let me know and we can recalculate.)
~3.4gb is closer to your observation of ~5.7gb, but not quite close enough. Some other things to keep in mind:
- Serialization overhead: The actor framework generally uses NetDataContractSerializer to serialize the data in your actor state. You might want to test to see if that's causing your 150K of state to be 60% bigger (that would be a lot of overhead, but it's not unheard of)
"leftover" actors. If you're creating replicas dynamically, one thing to keep in mind is that they don't get fully deleted until you tell SF to remove them
var serviceUri = ActorNameFormat.GetFabricServiceUri(typeof(IMyActor), actorAppName);
var actorServiceProxy = ActorServiceProxy.Create(actorId.GetPartitionKey(), serviceUri);
await actorServiceProxy.DeleteActorAsync(actorId, cancellationToken);
The growth stopped but the usage did not shrink.
This could just be space that was allocated at the datastore level that isn't getting repacked/reclaimed. We'd need to look at what's actually still occupying space to understand the situation. Some of this depends on the actual persistence store (ESE/KVS vs. the dictionary based state provider). It's also possible that the ActorIds that your generating changed somehow as a part of your upgrade, so that the new code isn't able to reference the "old" ActorIds (but that feels unlikely).