I'm confused by the situation and trying to fix this for a couple of days now. I'm running 3 shard on top of three 3-members replica sets (rs0, rs1 and rs2). All is working so far. Data is distributed over the 3 shards as well as cloned within the replica sets.
BUT: importing data into one of the replica set works fine with constantly 40k docs/s but by enabling sharding slows the entire process down to just 1.5k docs/s.
I've populated the data via different methods:
- generated some random data in the mongo shell (running in my mongos)
- JSON data import via mongoimport
- MongoDB dump restore from another server via mongorestore
All of them result in just 1.5k doc/s which is disappointing. The mongod's are physical Xeon boxes with 32GB each, the 3 config servers are virtual servers (40 GB HDD, 2 GB RAM, if that matters), the mongos is running on my app server. By the way, the value of 1.5k inserts/s doesn't depend on the shard key, same behaviour for a dedicated shard key (single field key as well as compound key) as well as hashed shard key on _id field.
I tried a lot, even reinstalled the entire cluster twice. The question is: what is the bottleneck in this setup:
- config servers running on virtual server? -> shouldn't be problematic due to the low resource consumption of config servers
- mongos? -> running multiple Mongos on a dedicated box behind HAproxy might be an alternative, haven't tested that yet
Tag Aware Sharding
– Ori Dar