0
votes

I have a table where I store product item information. The format of the row key is Business Unit UUID + Product ID + product serial #. Each of the row key components is of fixed byte length.

Writes to the table will occur in bursts (possibly 100Ks of records) with constant BU UUID, but with either the Product ID, serial # or both more or less changing at random.

Reads from the table will be one row at a time (no scans) with random key components.

My question is, will the BU ID being fixed during a write burst result in hotspotting a particular node and or tablet? My understanding is that I should be OK since my overall row key value is not monotonically increasing, but I want to be sure.

1
Did you experience problems, or are you actively experiencing problems? Hotspots can occur in your situation, depending on quite a few variables. The Cloud Bigtable team provides some tooling for hotspot detection, which may help with your specifics. You can raise a support ticket to add you to the tool's whitelist. - Solomon Duskis
I'm not actively experiencing problems, but just wondering if my key design is a priori bad. I've looked at the Google article on time series and I understand the problem there, that being the ever increasing key value. My key doesn't have that problem, but it does have the property that most of the time only the least significant bytes of the key will be changing. I'm just wondering if that is going to lead to hotspotting. - Dave McCullough

1 Answers

3
votes

As noted by Solomon it is possible that you would observe hotspotting even with a changing key. It would depend on the total number of nodes you have, write volume, and size of the rows.

Bigtable will attempt to dynamically rebalance so that the key space is evenly distributed among its servers, but you might see better results if you apply the salting technique described in the Time series schema design documentation: https://cloud.google.com/bigtable/docs/schema-design-time-series#ensure_that_your_row_key_avoids_hotspotting

In general we would recommend trying this out and experimenting if possible. You can generate load and then use the Cloud Key Visualizer (https://cloud.google.com/bigtable/docs/keyvis-overview) to inspect whether you are encountering hotspots as long as you have enough data available to perform the analysis (https://cloud.google.com/bigtable/docs/keyvis-getting-started#viewing-scan).

You may also find this talk presented at Google Cloud Next 2018 useful: https://www.youtube.com/watch?v=3QHGhnHx5HQ

It describes an approach for doing iterative schema design with the help of the Cloud Key Visualizer.