I've been reading about the read-only lock-free transactions as implemented in Google Spanner and CockroachDB. Both claim to be implemented in a lock-free manner by making use of system clocks. Before getting to the question, here is my understanding (please skip the following section if you are aware of the machineries in both systems or just in CockroachDB):
- Spanner's approach is simpler -- before committing a write transaction, Spanner picks the max timestamp across all involved shards as the commit timestamp, adds a wait, called commit wait, to for the max clock error before returning from it's write transaction. This means that all causally dependent transactions (both reads and writes) will have a timestamp value higher than the commit timestamp of the previous write. For read transactions, we pick the latest timestamp on the serving node. For example, if there was a write committed at timestamp 5, and the max clock error was 2, future writes and reads-only transactions will at least have a timestamp of 7.
- CockroachDB on the other hand, does something more complicated. On writes, it picks the highest timestamp among all the involved shards, but does not wait. On reads, it assigns a preliminary read timestamp as the current timestamp on the serving node, then proceeds optimistically by reading across all shards and restarting the read transaction if any key on any shard reports a write timestamp that might imply uncertainty about whether the write causally preceeded the read transaction. It assumes that keys with write timestamps less than the timestamp for the read transaction either appeared before the read transaction or were concurrent with it. The uncertainty machinery kicks in on timestamps higher than the read transaction timestamp. For example, if there was a write committed at timestamp 8, and a read transaction was assigned timestamp 7, we are unsure about whether that write came before the read or after, so we restart the read transaction with a read timestamp of 8.
Relevant sources - https://www.cockroachlabs.com/blog/living-without-atomic-clocks/ and https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
Given this implementation does CockroachDB guarantee that the following two transactions will not see a violation of serializability?
- A user blocks another user, then posts a message that they don't want the blocked user to see as one write transaction.
- The blocked user loads their friends list and their posts as one read transaction.
As an example, consider that the friends list and posts lived on different shards. And the following ordering happens (assuming a max clock error of 2)
- The initial posts and friends list was committed at timestamp 5.
- A read transaction starts at timestamp 7, it reads the friends list, which it sees as being committed at timestamp 5.
- Then the write transaction for blocking the friend and making a post gets committed at 6.
- The read transaction reads the posts, which it sees as being committed at timestamp 6.
Now, the transactions violate serializability becasue the read transaction saw an old write and a newer write in the same transaction.
What am I missing?