Cloud Bigtable docs on schema design for time series say:
In the vast majority of cases, time-series queries are accessing a given dataset for a given time period. Therefore, make sure that all of the data for a given time period is stored in contiguous rows, unless doing so would cause hotspotting.
Additionally, here's what they recommend to avoid hotspotting:
If you're storing a cell phone's battery status, and your row key consists of the word "BATTERY" plus a timestamp, the row key will always increase in sequence. Because Cloud Bigtable stores adjacent row keys on the same server node, all writes will focus only on one node until that node is full, at which point writes will move to the next node in the cluster.
Field promotion is suggested:
Move fields from the column data into the row key to make writes non-contiguous.
For example:
BATTERY#20150301124501001 --> BATTERY#Corrie#20150301124501001
Questions:
- Field promotion may solve hotspotting. Still, wouldn't that make querying by time range a little bit difficult?
- On the other side, is hotspotting avoidable if you want to query a range ONLY by TIMESTAMP? Don't think so, right?
Thanks!