How to keep state consistent across distributed systems

Question

When building distributed systems, it must be ensured the client and the server eventually ends up with consistent view of the data they are operating on, i.e they never get out of sync. Extra care is needed, because network can not be considered reliable. In other words, in the case of network failure, client never knows if the operation was successful, and may decide to retry the call.

Consider a microservice, which exposes simple CRUD API, and unbounded set of clients, maintained in-house by the same team, by different teams and by different companies also.

In the example, client request a creation of new entity, which the microservice successfully creates and persists, but the network fails and client connection times out. The client will most probably retry, unknowingly persisting the same entity second time. Here is one possible solution to this I came up with:

Use client-generated identifier to prevent duplicate post

This could mean the primary key as it is, the half of the client and server -generated composite key, or the token issued by the service. A service would either persist the entity, or reply with OK message in the case the entity with that identifier is already present.

But there is more to this: What if the client gives up after network failure (but entity got persisted), mutates it's internal view of the entity, and later decides to persist it in the service with the same id. At this point and generally, would it be reasonable for the service just silently:

Update the existing entity with the state that client posted

Or should the service answer with some more specific status code about what happened? The point is, developer of the service couldn't really influence the client design solutions.

So, what are some sensible practices to keep the state consistent across distributed systems and avoid most common pitfalls in the case of network and system failure?

I don't understand the usecase "Assign the existing entity with the one that client posted". Could you give me more details or help me understand? — Constantin Galbenu
Well, it just means delete the old entity if exists, and insert the new one client posted. The old one and new one has the same primary key. Alternatively the server could answer the entity exists already (as it does), but it would have different state than the client posted — Tuomas Toivonen
But why should the server delete the old entity? Why doesn't it just leave it that way and when the client updates it they will get in sync. — Constantin Galbenu
The server doesn't know what the client knows, so it shouldn't do anything unless the client requests it. — Constantin Galbenu

Constantin Galbenu Constantin Galbenu · Accepted Answer · 2018-02-19T08:53:00

There are some things that you can do to minimize the impact of the client-server out-of-sync situation.

The first measure that you can take is to let the client generate the entity IDs, for example by using GUIDs. This prevents the server to generate a new entity every time the client retries a CreateEntityCommand.

In addition, you can make the command handing idempotent. This means that if the server receives a second CreateEntityCommand, it just silently ignores it (i.e. it does not throw an exception). This depends on every use case; some commands cannot be made idempotent (like updateEntity).

Another thing that you can do is to de-duplicate commands. This means that every command that you send to a server must be tagged with an unique ID. This can also be a GUID. When the server receives a command with an ID that it already had processed then it ignores it and gives a positive response (i.e. 200), maybe including some meta-information about the fact that the command was already processed. The command de-duplication can be placed on top of the stack, as a separate layer, independent of the domain (i.e. in front of the Application layer).

How to keep state consistent across distributed systems

1 Answers