Google Spanner recommends against using things like timestamps or sequential numbers as the initial part of a primary key or index, which makes sense based on the architecture. However, for my requirements, I do need some way of ensuring a strict “append only” ordering of rows.
I’m using Spanner to model events (as in event sourcing). Each event would have a category, a stream id identifying a sequence where events needs to be strictly ordered with relation to each other, and a few payload fields - I'll ignore the actual payload from here on.
Naively, this would be modelled as:
| Category | STRING |
| Stream Id | STRING |
| Sequence Nr | INT64 |
(With a primary key consisting of Category, Stream Id, Sequence Nr.) This would ensure a strong ordering of events for one stream. Now as some categories have a lot of events associated with them and Spanner best practices is to have variance in the high bits, it would be best to flip this around. Each "stream" will contain a reasonably small number of events (thousands not millions) and will be read together so to facilitate better distribution of data and encourage locality for events belonging to one stream:
| Stream Id | STRING |
| Category | STRING |
| Sequence Nr | INT64 |
However, as I'd like to be able to append events without having to read the current state to find out the current sequence number, I'd rather use a timestamp.
| Aggregate Id | STRING | |
| Category | STRING | |
| Timestamp | TIMESTAMP | allow_commit_timestamp |
Spanner has a commit timestamp built in that would tag it at the time of the transaction actually being processed. But to the question finally:
Is it possible to represent data as above and get unique commit timestamps even if I commit multiple events in one transaction?
If not, is it possible to ensure strict ordering some other way, by adding additional columns to ensure order?
The documentation states that "Commit timestamp values are not guaranteed to be unique. Transactions that write to non-overlapping sets of fields might have the same timestamp. Transactions that write to overlapping sets of fields have unique timestamps." - but I'm not quite clear on what constitutes as "sets of fields" in this context.
The documentation also states that "The commit timestamp makes creating the changelog easier, because the timestamps can enforce ordering of the changelog entries." but it's not clear what the guarantees are around commit timestamps having an enforced order in the context of multiple concurrent writers or multiple events being written at the same time.