I'm not sure about proper design of an approach.
We use optimistic locking using long
incremented version placed on every entity. Each update of such entity is executed via compare-and-swap algorithm which just succeed or fail depending on whether some other client updates entity in the meantime or not. Classic optimistic locking as e.g. hibernate do.
We also need to adopt re-trying approach. We use http
based storage (etcd) and it can happen that some update request is just timeouted.
And here it's the problem. How to combine optimistic locking and re-try. Here is the specific issue I'm facing.
Let say I have an entity having version=1
and I'm trying to update it. Next version is obviously 2
. My client than executes conditional update. It's successfully executed only when the version in persistence is 1
and it's atomically updated to version=2
. So far, so good.
Now, let say that a response for the update request does not arrive. It's impossible to say if it succeeded or not at this moment. The only thing I can do now is to re-try the update again. In memory entity still contains version=1
intending to update value to 2
.
The real problem arise now. What if the second update fails because a version in persistence is 2
and not 1
?
There is two possible reasons:
- first request indeed caused the update - the operation was successful but the response got lost or my client timeout, whatever. It just did not arrived but it passed
- some other client performed the update concurrently on the background
Now I can't say what is true. Did my client update the entity or some other client did? Did the operation passed or failed?
Current approach we use just compares persisted entity and the entity in main memory. Either as java equal or json content equality. If they are equal, the update methods is declared as successful. I'm not satisfied with the algorithm as it's not both cheap and reasonable for me.
Another possible approach is to do not use long
version but timestamp
instead. Every client generates own timestamp within the update operation in the meaning that potential concurrent client would generate other in high probability. The problem for me is the probability, especially when two concurrent updates would come from same machine.
Is there any other solution?