Credit to johanandren and the linked article as basis for the following answer:
Both a router
and sharding
distribute work. Sharding
is required if, additionally to load balancing, the recipient actors have to reliably manage state that is directly associated with the entity identifier
.
To recap, the entity identifier
is a key, derived from the message being sent, determining the message's receipient actor in the cluster.
First of all, can you manage state associated with an identifier
across different nodes using a consistently hashing router? A Consistent Hash
router will always send messages with an equal identifier
to the same target actor. The answer is: No, as explained below.
The hash-based method stops working when nodes in the cluster go Down or come Up, because this changes the associated actor for some identifiers. If a node goes down, messages that were associated with it are now sent to a different actor in the network, but that actor is not informed about the former state of the actor which it is now replacing. Likewise, if a new node comes up, it will take care of messages (identifiers) that were previously associated with a different actor, and neither the new node or the old node are informed about this.
With sharding, on the other hand, the actors that are created are aware of the entity identifier
that they manage. Sharding
will make sure that there is exactly one actor managing the entity in the cluster. And it will re-create sharded actors on a different node if their parent node goes down. So using persistence
they will retain their (persisted) state across nodes when the number of nodes changes. You also don't have to worry about concurrency issues if an actor is re-created on a different node thanks to Sharding. Furthermore, if a message with a new entity identifier
is encountered, for which an actor does not exist yet, a new actor is created.
A consistently hashing router may still be of use for caching, because messages with the same key generally do go to the same actor. To manage a stateful entity that exists only once in the cluster, Sharding
is required.
Use routers for load balancing, use Sharding
for managing stateful entities in a distributed manner.