5
votes

I am trying to understand why our actor service is using more disk space than expected. Our service currently contains around 80,000 actors distributed over 10 partitions. Each actor stores around 150Kb of state.

Looking at one (out of 10) nodes in our cluster, I would expect to see:

  • Disk space used for around 3 partitions (one as primary and two as secondary)
    • This is as expected
  • Drilling down into one partition folder, I would expect to see just one replica id
    • Not as expected:
      • I see the expected one (the one that matches the replica listed under the nodes section in Service Fabric Explorer). The replica id is prefixed with an R_
      • In the same partition folder, I see 3 other folders with replica ids starting with prefix S_. These replica ids do not match any value listed in Service Fabric Explorer under the Applications node.
  • Looking at the replica folder starting with R_, I would expect the folder to contain not much more than the size of 8000 actors taking up around 150 Kb each so around 1.14 Gb of data.
    • Not as expected:
      • The folder contains a file ActorStateStore and its size is 5.66Gb

Another thing that I am trying to understand is the following:

  • Version 1 of our application did not clean up unused actors. As you would expect, we saw the disk usage on each of the nodes grow at a steady pace.
  • Version 2 of our application started to delete unused actors. Since this new code would more than half the active actors, what I expected after deployment was that the overall used disk size would drop significantly.
    • Did not happen, the growth stopped but the usage did not shrink.

So my questions are:

  1. Are my expectations correct?
  2. What could explain my observations?
1

1 Answers

1
votes

Drilling down into one partition folder, I would expect to see just one replica id

If things have been running for a while, I'd expect to see more than one. This is because of two things:

  1. Service Fabric keeps the information for replicas which failed around on the nodes for at least the ReplicaRestartWaitDuration. This is so that if local recovery is possible, there's still the information necessary on the node. If the replica just failed and can't be cleanly dropped for example, these sorts of files can accumulate. They can also be present if someone "ForceRemoved" individual replicas since that explicitly skips clean shutdown. This is part of why we generally don't recommend using this command in production environments.
  2. There's also a setting known as the "UserStandbyReplicaKeepDuration" which governs how long SF keeps old replicas around that are not needed right now, in case they are needed later (because it's usually cheaper to rebuild from partial state than full state).

    a. For example, say a node some replica was on failed and stayed down longer than the ReplicaRestartWaitDuration for that service. When this happens SF builds a replacement replica to get you back up to your TargetReplicaSetSize.

    b. Let's say that once that replica is built the node that failed comes back.

    c. If we're still within the StandbyReplicaKeepDuration for that replica, then SF will just leave it there on disk. If there's another failure in the meantime, SF will usually (depends on the Cluster Resource Manager settings, whether this node is a valid target etc.) pick this partial replica and rebuild the replacement from what remains on the drive.

    So you can see replicas from the past whose information is still being kept on the drives, but you generally shouldn't see anything older than the UserStandbyReplicaKeepDuration (by default a week). You can always reduce that duration in your cluster if you want.

I would expect the folder to contain not much more than the size of 8000 actors taking up around 150 Kb each so around 1.14 Gb of data. Not as expected:The folder contains a file ActorStateStore and its size is 5.66Gb

This is a bit more puzzling. Let't go back to the amount of stuff we expect to be on a given node. You say you have 80K actors. I presume you have a TargetReplicaSetSize of 3, so that's really more like 240K actors. Each actor is ~150K of state, so that's ~34 GB of state for the cluster. Per node then we'd expect 3.4 GB of state. (I think your original estimate forgot replication. If you've actually got a TargetReplicaSetSize of 1, then let me know and we can recalculate.)

~3.4gb is closer to your observation of ~5.7gb, but not quite close enough. Some other things to keep in mind:

  • Serialization overhead: The actor framework generally uses NetDataContractSerializer to serialize the data in your actor state. You might want to test to see if that's causing your 150K of state to be 60% bigger (that would be a lot of overhead, but it's not unheard of)
  • "leftover" actors. If you're creating replicas dynamically, one thing to keep in mind is that they don't get fully deleted until you tell SF to remove them

    var serviceUri = ActorNameFormat.GetFabricServiceUri(typeof(IMyActor), actorAppName); var actorServiceProxy = ActorServiceProxy.Create(actorId.GetPartitionKey(), serviceUri); await actorServiceProxy.DeleteActorAsync(actorId, cancellationToken);

The growth stopped but the usage did not shrink.

This could just be space that was allocated at the datastore level that isn't getting repacked/reclaimed. We'd need to look at what's actually still occupying space to understand the situation. Some of this depends on the actual persistence store (ESE/KVS vs. the dictionary based state provider). It's also possible that the ActorIds that your generating changed somehow as a part of your upgrade, so that the new code isn't able to reference the "old" ActorIds (but that feels unlikely).