4
votes

I have executed "mongodump" on 3-shard cluster against the database with size 600GB and chunks equally distributed across all 3 shards.

My mongodump command was like this:

mongodump --db mydb123 --authenticationDatabase admin --journal -u root -p password123 -o mydb123

Chunks were distributed across all 3 shards just about equally. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Then I have moved the dump file to a new cluster and executed "mongorestore" of this dump file on new 2-shard cluster. The size of this database is now only 80 GB. I guess this is expected (the compact action). But here is my problem: on new 2-shard cluster, executing command "sh.status()" does not show any CHUNKS for this database. My mongorestore command was like this:

mongorestore -u root -p newpass123 --authenticationDatabase admin --verbose /data/db/backups/new_dir/mydumpfile

++++++++++++++++++++++++++++++++++++ There was no error in execution of this mongorestore command. The actual output of the SH.STATUS() is shown below:

mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 3, "minCompatibleVersion" : 3, "currentVersion" : 4, "clusterId" : ObjectId("52efaaa0a83668acafc3bcb0") } shards: { "_id" : "sh1", "host" : "sh1/hfdvmprmongodb1:27000,hfdvmprmongodb2:27000" } { "_id" : "sh2", "host" : "sh2/hfdvmprmongodb1:27001,hfdvmprmongodb2:27001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : false, "primary" : "sh1" } { "_id" : "pricing", "partitioned" : true, "primary" : "sh2" } { "_id" : "mokshapoc", "partitioned" : true, "primary" : "sh1" }

mongos> isBalancerRunning() Tue Feb 4 11:09:39.242 ReferenceError: isBalancerRunning is not defined mongos> sh.isBalancerRunning() true

++++++++++++++++++++++++++++++++++++++++++++++++++++ So, i have fully completed mongorestore and no CHUNKS shown for 80 GB database (that used to be 600 GB database at the time of the mongodump execution)

I am very much confused by the fact that I do not see any chunks. (the size was expected to be smaller and it is much smaller)

version on both clusters is the same: MongoDB shell version: 2.4.6

Thank you, vr

1
I'm not near being a mongodb expert. Just it seems to me that I heard somewhere that splits are triggered by inserts. Try inserting a new doc into your sharded collection. May be that'll help.facha
I'm suspicious that you restored to a different database than you're checking the status of. Please edit question to include invoking mongo or mongos and subsequent command so we see which db you're attaching to.Kevin J. Rice
To respond on the comment from Kevin J. Rice (see just above), I would like to clarify that the database name in the mongodump and in the mongorestore was exactly the same. However, i have to be compliant with corporate policy and i have removed the actual database name and the username/password information from the initial post. Sorry if this created any confusion. Thank you for your input and help. VRvrdba
Had you already created and sharded your target database & collections before running the mongorestore? mongorestore does not change any sharding options for the target database, so the lack of chunks may be because your target isn't sharded.Stennie
@vrdba, the dump created by mongodump doesn't reflect the chunks or the distribution of data. If you want to distribute the data to shards when restoring the dump to a sharded cluster, you must shard the collection before running mongorestore. You can find more information here: docs.mongodb.org/manual/tutorial/…Linda Qin

1 Answers

3
votes

mongodump dumps out the data and indexes for all the collections in a database (or databases if you are dumping many). It does not dump any metadata other than indexes, which means that if you restore a dump into a sharded collection, it will be sharded (split into chunks, balanced). If you restore data into an unsharded collection, then it will stay unsharded.

Whether the dump came from sharded collection does not matter, since that information stays with the cluster and does not go with the data.