Mongo sharding not removing data of sharded collection in source shard

Question

I have MongoDB 3.2.6 installed on 5 machines which all form sharded cluster consisting of 2 shards (each is replica set with primary-secondary-arbiter configuration).

I also have a database with very large collection (~50M records, 200GB) and it was imported through mongos which put it to primary shard along with other collections.

I generated hashed id on that collection which will be my shard key.

After thay I sharded collection with:

> use admin
> db.runCommand( { enablesharding : "my-database" } )

> use my-database
> sh.shardCollection("my-database.my-collection", { "_id": "hashed" } )

Comand returned:

{ "collectionsharded" : "my-database.my-collection", "ok" : 1 }

And it actually started to shard. Status of shard looks like this:

> db.my-collection.getShardingDistribution()
Totals
data : 88.33GiB docs : 45898841 chunks : 2825
Shard my-replica-1 contains 99.89% data, 99.88% docs in cluster, avg obj size on shard : 2KiB
Shard my-replica-2 contains 0.1% data, 0.11% docs in cluster, avg   obj size on shard : 2KiB()

This all looks ok but problem is that when I count my-collection through mongos I see number is increasing.

When I log in to primary replica set (my-replica-1) I see that number of records in my-collection is not decreasing although number in my-replica-2 is increasing (which is expected) so I guess mongodb is not removing chunks from source shard while migrating to second shard.

Does anyone know is this normal and if not why is it happening?

EDIT: Actually now it started to decrease on my-replica-1 although it still grows when counting on mongos (sometimes it goes little down and then up). Maybe this is normal behaviour when migrating large collection, I don't know

Ivan

profesor79 profesor79 · Accepted Answer · 2016-06-09T15:15:42

according to documentation here you are observing a valid situation. When document is moved from a to b it is counted twice as long as a receive confirmation that relocation was successfule.

On a sharded cluster, db.collection.count() can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress.

To avoid these situations, on a sharded cluster, use the $group stage of the db.collection.aggregate() method to $sum the documents. For example, the following operation counts the documents in a collection:

db.collection.aggregate(
   [
      { $group: { _id: null, count: { $sum: 1 } } }
   ]
)

Mongo sharding not removing data of sharded collection in source shard

1 Answers