MongoDB chunks balancing with compound sharding key

Question

Say I have three shards, use compound key { x: 1, y: 1 } for a collection, x has three int values: 1, 2, 3, y is random.

Then I insert same amount of documents for x = 1, x = 2 and x = 3. What I'm expecting is all chunks with range of x = 1 go to shard1, x = 2 go to shard2, x = 3 go to shard3, then I can have query isolation. But the output is unexpected:

test.t6

shard key: { "x" : 1, "y" : 1 }

chunks:
shard0000   5
shard0002   5
shard0001   5
{ "x" : { "$minKey" : 1 }, "y" : { "$minKey" : 1 } } -->> { "x" : 1, "y" : 0 } on : shard0000 Timestamp(2, 0) 
{ "x" : 1, "y" : 0 } -->> { "x" : 1, "y" : 11593 } on : shard0002 Timestamp(3, 0) 
{ "x" : 1, "y" : 11593 } -->> { "x" : 1, "y" : 34257 } on : shard0000 Timestamp(4, 0) 
{ "x" : 1, "y" : 34257 } -->> { "x" : 1, "y" : 56304 } on : shard0002 Timestamp(5, 0) 
{ "x" : 1, "y" : 56304 } -->> { "x" : 1, "y" : 78317 } on : shard0000 Timestamp(6, 0) 
{ "x" : 1, "y" : 78317 } -->> { "x" : 2, "y" : 3976 } on : shard0002 Timestamp(7, 0) 
{ "x" : 2, "y" : 3976 } -->> { "x" : 2, "y" : 26497 } on : shard0000 Timestamp(8, 0) 
{ "x" : 2, "y" : 26497 } -->> { "x" : 2, "y" : 48788 } on : shard0002 Timestamp(9, 0) 
{ "x" : 2, "y" : 48788 } -->> { "x" : 2, "y" : 74377 } on : shard0000 Timestamp(10, 0) 
{ "x" : 2, "y" : 74377 } -->> { "x" : 2, "y" : 99329 } on : shard0002 Timestamp(11, 0) 
{ "x" : 2, "y" : 99329 } -->> { "x" : 3, "y" : 25001 } on : shard0001 Timestamp(11, 1) 
{ "x" : 3, "y" : 25001 } -->> { "x" : 3, "y" : 49652 } on : shard0001 Timestamp(9, 2) 
{ "x" : 3, "y" : 49652 } -->> { "x" : 3, "y" : 72053 } on : shard0001 Timestamp(9, 4) 
{ "x" : 3, "y" : 72053 } -->> { "x" : 3, "y" : 97436 } on : shard0001 Timestamp(10, 2) 
{ "x" : 3, "y" : 97436 } -->> { "x" : { "$maxKey" : 1 }, "y" : { "$maxKey" : 1 } } on : shard0001 Timestamp(10, 3)

My assumption is that MongoDB isn't that smart, it just balance chunks number among nodes, it dose not take compound key grouping into consideration, am I right? Or am I missing something?

What's the strategy when it balance chunks? I understand how it choose the from side and to side, but the docs didn't say anything about how it choose which chunk to move.

Thanks.

Stennie Stennie · Accepted Answer · 2015-01-12T14:21:19

My assumption is that MongoDB isn't that smart, it just balance chunks number among nodes, it dose not take compound key grouping into consideration, am I right? Or am I missing something?

You are correct in that the MongoDB server (as at 3.4) is not trying to overthink how to distribute chunks by default. A chunk represents a logical range of documents in a shard key range (by default up to 64MB), and the general goal is to have a roughly equal distribution of data per shard (as proxied by number of chunks).

However, to put compound key grouping into context you need to consider how chunk distribution might affect read and write use cases.

Reading data from the sharded cluster

Queries fetch documents from the server in cursor batches which cannot exceed the maximum BSON document size (currently 16MB):

For most queries, the first batch returns 101 documents or just enough documents
to exceed 1 megabyte. Subsequent batch size is 4 megabytes. To override the
default size of the batch, see batchSize() and limit().

Assuming you haven't changed any defaults in either batch or chunk sizes, this means that a range-based query on {x, y} will still be able to fill many batches from a single chunk range on a single targeted shard (or occasionally more than one depending on the size/distribution of documents and chunks).

Writing data to the sharded cluster

One of the main reasons for sharding is to increase your write throughput. Depending on your choice of shard key and how data arrives, there may be benefits in distributing the data for consecutive shard key chunks to different shards to avoid potential hot spots. Since you only have three values for x in your example, having ranges for a given value of x on different shards could improve your throughput by parallelizing writes across shards.

What's the strategy for balancing chunks?

I understand how it choose the from side and to side, but the docs didn't say anything about how it choose which chunk to move.

The strategy for Sharded Collection Balancing is detailed in the MongoDB manual, but the short version is that the balancer waits until certain thresholds have been exceeded (differences between shard with least & most chunks) and a balance round will continue until the difference between the number of chunks on any two shards for that collection is less than two or a chunk migration fails.

Why isn't the balancer smarter?

It's difficult to generalise balancer policies in a way that will suit all workloads and deployments. Depending on your data distribution, shard key, and access patterns the same approach that is excellent for one use case may not support yours.

For some discussion on this see SERVER-5047: be smarter about which chunk moves when balancing and related issues.

Some balancing suggestions include:

balance based on index order
balance based on working set estimate
use random shard assignment
load based balancing

Most of these suggestions require the balancer to monitor additional metrics across the cluster which add additional complexity & coordination. For example, balancing by some load metric (CPU, RAM, network usage) sounds promising until you consider these metrics need to be tracked over time (including cross-platform abstractions) and the balancer would require more complex policy to define "balance" thresholds and ignore temporary imbalances based on access patterns or server restarts.

Are there any alternatives to the default balancer policy?

In general you probably will want to use the default balancer policy, however if you think there is a more suitable way to balance your data there are a few approaches to consider:

If you want your data to have some specific shard affinity, there is an advanced sharding option called Sharding Zones (MongoDB 3.4+) or Tag-Aware Sharding (MongoDB 3.2 and older) that allows you to associate ranges of chunks with specific named shards. Use cases for this are typically more specialised as tagging can lead to an intentional imbalance of data. Some common use cases include optimising physical resources (e.g. tiered storage for "hot" and "cold" data), location-based separation of data (geo affinity), and balancing unsharded collections across your sharded cluster.
While it's highly encouraged to use the default balancer, it is also possible to disable the balancer and Manually Migrate Chunks using the mongo shell or perhaps by implementing your own balancer script.