1
votes

The perils of using a monotonically increasing key (like a traditional timestamp) are pretty clearly laid out in the docs.

What is less clear, at the time of this writing, is the likely impact of using a monotonically decreasing pattern in a key, which is an approach suggested when regularly retrieving "the most recent records first".

Can anyone speak with authority on the effects of using decreasing keys compared to increasing keys, perhaps: "comparable hotspotting", "reduced hotspotting", or "no hotspotting but causes other undesirable/catastrophic behavior"?

P.S. Granted I may not (and may never) have "big" enough data to suggest Bigtable as an appropriate datastore choice, it is unclear to me why Bigtable is described as "a natural fit" for a time series data when the "best practices" for a likely reader (i.e. use range scans over keys -- probably clustered by timestamps) seem directly inconvenienced by the "best practices" for a likely writer (i.e. don't use timestamps except to the extent that keys can be "de-clustered" by promoted fields, salt-shards, or random-entropy) but maybe I'm missing something... or perhaps is this just the "state of the art"?

1
Do you have a specific, concrete use case in mind that would use this or is this a theoretical question? I'll answer your question below, but your underlying issue that may be driving this question may well have a different answer. Consider asking another question with your specific use case in mind. - Misha Brukman
I generally want to have a historical upsert/append-only archival log of "events" where the events are preferably written to a global log and possibly to several "topic-specific" local logs too. The aim is to be able to read all events within a given timespan. Promoting "topics" seems like a shallow "solution" since I might end up with very few or disproportionately chatty topics -- and deriving a global log from local topic logs alone would seem to undesirably require discovery of unknown topics at query time... Maybe I should be using another (managed) database like Google Cloud SQL? - Justin C. Moore
How many events do you plan to write per second? Do you plan to only query for "all events in time region" and no other types of queries? You may want to look into BigQuery as another approach, append-only will work very well there, but each query will be a full-table-scan, so if you're going to have a lot odata, folks create a new table for every day, and then join data across days when needed. - Misha Brukman
So the truth is I don't really know what my write rate would be. I had also looked at BigQuery, but beyond doing range scans, I anticipate also wanting to pull out individual event records by Id. Thanks for the tips - Justin C. Moore

1 Answers

1
votes

Monotonically decreasing keys are similarly bad to monotonically increasing keys: the former will end up hammering the node handling the lexicographically first tablet (region in HBase terms) while the latter will keep hammering the last tablet in a cluster. These may be assigned to the same or different node in a cluster.

The ideal access pattern for Bigtable is distributed reads and distributed writes, rather than only-latest-keys or only-earliest-keys.

So, while time itself is monotonically growing, if data is coming from different sources (e.g., monitoring data coming from different servers, or stock prices coming from different tickers, or temperature readings coming from different devices, etc.), then it's possible to combine them into keys such as:

<device-id>#<timestamp-range>

which would be monotonically increasing per device but not globally, especially given thousands or millions of devices sending data in parallel.