1
votes

While browsing mongodb sharding tutorials I came across the following assertion :

"If you use shard key in the query, its going to hit a small number of shards, often only ONE"

On the other hand from some of my earlier elementary knowledge of sharding, I was under the impression that mongos routing service can uniquely point out the target shard if the query is fired on Shard Key. My question is - under what circumstances, a shard key based query stands a chance of hitting multiple shards?

1

1 Answers

2
votes

A query using the shard key will target the subset of shards to retrieve data for your query, but depending on the query and data distribution this could be as few as one or as many as all shards.

Borrowing a helpful image from the MongoDB documentation on shard keys: Shard key example from MongoDB documentation

MongoDB uses the shard key to automatically partition data into logical ranges of shard key values called chunks. Each chunk represents approximately 64MB of data by default, and is associated with a single shard that currently owns that range of shard key values. Chunk counts are balanced across available shards, and there is no expectation of adjacent chunks being on the same shard.

If you query for a shard key value (or range of values) that falls within a single chunk, the mongos can definitely target a single shard.

Assuming chunk ranges as in the image above:

// Targeted query to the shard with Chunk 3
db.collection.find( { x: 50 } )

// Targeted query to the shard with Chunk 4
db.collection.find( {x: { $gte: 200} } )

If your query spans multiple chunk ranges, the mongos can target the subset of shards that contain relevant documents:

// Targeted query to the shard(s) with Chunks 3 and 4
db.collection.find( {x: { $gte: 50} } )

The two chunks in this example will either be on the same shard or two different shards. You can review the explain results for a query to find out more information about which shards were accessed.

It's also possible to construct a query that would require data from all shards (for example, based on a large range of shard key values):

// Query includes data from all chunk ranges
db.collection.find( {x: { $gte: -100} } )

Note: the above information describes range-based sharding. MongoDB also supports hash-based shard keys which will (intentionally) distribute adjacent shard key values to different chunk ranges after hashing. Range queries on hashed shard keys are expected to include multiple shards. See: Hashed vs Ranged Sharding.