0
votes

I set up a MongoDB shard cluster on windows for the first time as explained here: http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/

I used the following script to get up the config servers, mongos instance and mongo data servers.

function get-cnfgServerObj($port, $path) {

    $firstCnfgSrv = New-Object PSObject
    $firstCnfgSrv | Add-Member Noteproperty -Name Port -value $port
    $firstCnfgSrv | Add-Member Noteproperty -Name Path -value $path

    return $firstCnfgSrv;
}

$mongodPath = "..\mongod.exe"
$mongosPath = "..\mongos.exe"
$cnfgServers = @(
    (get-cnfgServerObj -port 20001 -path "..\data\configdb20001"),
    (get-cnfgServerObj -port 20002 -path "..\data\configdb20002"),
    (get-cnfgServerObj -port 20003 -path "..\data\configdb20003")
)

$dataServers = @(
    (get-cnfgServerObj -port 27001 -path "..\data\shdb1"),
    (get-cnfgServerObj -port 27002 -path "..\data\shdb2")
)

# Create the mongo config servers first
$cnfgServers | foreach {

    if((Test-Path $_.Path) -eq $false)
    {
        New-Item -Path $_.Path -ItemType Directory
    }

    $args = "--configsvr --dbpath $($_.Path) --port $($_.Port)"
    start-process $mongodPath $args -windowstyle Normal
}

# Create the mongo servers
$dataServers | foreach {

    if((Test-Path $_.Path) -eq $false)
    {
        New-Item -Path $_.Path -ItemType Directory
    }

    $args = "--dbpath $($_.Path) --port $($_.Port)"
    start-process $mongodPath $args -windowstyle Normal
}

# Create the mongos instances
$mongosArgs = "--configdb localhost:20001,localhost:20002,localhost:20003"
start-process $mongosPath $mongosArgs -windowstyle Normal

After I run the above script, I connected the mongos instance:

mongo --host localhost --port 27017

Later, I added shards and enabled sharding on database and the collection:

// add servers as shards
sh.addShard("localhost:27001")
sh.addShard("localhost:27002")

// Enable sharding on 'foo' db
sh.enableSharding("foo")

// Enable sharding on 'testData' collection of 'foo' database
sh.shardCollection("foo.testData", { "x": 1, "_id": 1 })

Finally, I run the following comments inside the mongo shell (which is connected to mongos instance):

use foo

// init data
for (var i = 1; i <= 2500; i++) db.testData.insert( { x : i } )

Later, I connected to one of my data servers:

mongo --host localhost --port 27001

When I try to see the count of the documents inside the testData collection, I saw that the number is 2500 which represents the count of all the documents:

use foo
db.testData.count()

Simply, the data is not load balanced across all shards. What am I doing wrong here?

Edit:

Here is the sh.status() command result:

--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "version" : 3,
        "minCompatibleVersion" : 3,
        "currentVersion" : 4,
        "clusterId" : ObjectId("535673272850501ad810ff51")
}
  shards:
        {  "_id" : "shard0000",  "host" : "localhost:27001" }
        {  "_id" : "shard0001",  "host" : "localhost:27002" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }
        {  "_id" : "foo",  "partitioned" : true,  "primary" : "shard0000" }
                foo.testData
                        shard key: { "x" : 1, "_id" : 1 }
                        chunks:
                                shard0001       1
                                shard0000       1
                        { "x" : { "$minKey" : 1 }, "_id" : { "$minKey" : 1 } } -
->> { "x" : 1, "_id" : ObjectId("53567411cb144434eb53f08d") } on : shard0001 Tim
estamp(2, 0)
                        { "x" : 1, "_id" : ObjectId("53567411cb144434eb53f08d")
} -->> { "x" : { "$maxKey" : 1 }, "_id" : { "$maxKey" : 1 } } on : shard0000 Tim
estamp(2, 1)
2

2 Answers

4
votes

I will assume your sharded cluster is setup correctly and you are connecting to it correctly (although I don't know why you are connecting to one mongod and not the mongos, maybe you are inserting to the mongod and not the mongos?)

One reason for the lack of balancing is that your x,_id key is monotonic, a linear range going from lowest to highest, as such the writes will start from the start of the range and so forth.

Since MongoDB sharding is range based it will actually place the entire first range on the first shard until it see fit to actually balance it: http://docs.mongodb.org/manual/tutorial/manage-sharded-cluster-balancer/ which is most likely not the case yet.

Essentially the way to fix this is to choose your shard key: http://docs.mongodb.org/manual/tutorial/choose-a-shard-key/ wisely

1
votes

I guess you are doing nothing wrong here. MongoDB simply has a default chunk size of 64MB, meaning all your data is within one chunk and will thus be located at one shard.

You can modify the chunk size by issueing

db.settings.save( { _id:"chunksize", value: <sizeInMB> } )

which is also explained in the docs.