MongoDB Sharding - Query via mongos takes more time than query on mongod (PRIMARY)

Question

Setup Details:

mongos: RAM: 8 GB, CPUs: 2

Config Servers (Replica set of 3 config servers): RAM: 4 GB, CPUs: 2

Shard Cluster-1 (Replica of 3 mongod): RAM: 30 GB, CPUs: 4

Shard Cluster-2 (Replica of 3 mongod): RAM: 30 GB, CPUs: 4

Sharding: Collection: rptDlp, Key: {incidentOn: "hashed"}

Description:

I have more than 15 million records in a collection. I am retrieving last page documents having sorted by a field(indexed one) of type date.

Actual Query:

db.getCollection("rptDlp").find({ incidentOn: { $gte: new Date(1513641600000), $lt: new Date(1516233600000) } })
.sort({ incidentOn: -1 }).skip(15610600).limit(10)

If I execute this query directly against mongo shard server (PRIMARY), it shows result in 14 seconds. But through mongos, it takes more than 2 minutes and due to query timeout my application results in showing an error prompt.

If we assume it as network congestion, then every query should take 2 minutes. But when i retrieve documents for first page it shows result in few seconds.

Explain query result (against mongos):

    {
    "queryPlanner" : {
        "mongosPlannerVersion" : 1,
        "winningPlan" : {
            "stage" : "SHARD_MERGE_SORT",
            "shards" : [ 
                {
                    "shardName" : "rs0",
                    "connectionString" : "rs0/172.18.64.47:27017,172.18.64.48:27017,172.18.64.53:27017",
                    "serverInfo" : {
                        "host" : "UemCent7x64-70",
                        "port" : 27017,
                        "version" : "3.4.10",
                        "gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
                    },
                    "plannerVersion" : 1,
                    "namespace" : "mydatabase.rptDlp",
                    "indexFilterSet" : false,
                    "parsedQuery" : {a
                        "$and" : [ 
                            {
                                "incidentOn" : {
                                    "$lt" : ISODate("2018-01-19T06:13:39.000Z")
                                }
                            }, 
                            {
                                "incidentOn" : {
                                    "$gte" : ISODate("2017-12-19T00:00:00.000Z")
                                }
                            }
                        ]
                    },
                    "winningPlan" : {
                        "stage" : "LIMIT",
                        "limitAmount" : 10,
                        "inputStage" : {
                            "stage" : "SKIP",
                            "skipAmount" : 7519340,
                            "inputStage" : {
                                "stage" : "FETCH",
                                "inputStage" : {
                                    "stage" : "IXSCAN",
                                    "keyPattern" : {
                                        "incidentOn" : -1.0
                                    },
                                    "indexName" : "incidentOn_-1",
                                    "isMultiKey" : false,
                                    "multiKeyPaths" : {
                                        "incidentOn" : []
                                    },
                                    "isUnique" : false,
                                    "isSparse" : false,
                                    "isPartial" : false,
                                    "indexVersion" : 2,
                                    "direction" : "forward",
                                    "indexBounds" : {
                                        "incidentOn" : [ 
                                            "(new Date(1516342419000), new Date(1513641600000)]"
                                        ]
                                    }
                                }
                            }
                        }
                    },
                    "rejectedPlans" : []
                }, 
                {
                    "shardName" : "rs1",
                    "connectionString" : "rs1/172.18.64.54:27017",
                    "serverInfo" : {
                        "host" : "UemCent7x64-76",
                        "port" : 27017,
                        "version" : "3.4.10",
                        "gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
                    },
                    "plannerVersion" : 1,
                    "namespace" : "mydatabase.rptDlp",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                        "$and" : [ 
                            {
                                "incidentOn" : {
                                    "$lt" : ISODate("2018-01-19T06:13:39.000Z")
                                }
                            }, 
                            {
                                "incidentOn" : {
                                    "$gte" : ISODate("2017-12-19T00:00:00.000Z")
                                }
                            }
                        ]
                    },
                    "winningPlan" : {
                        "stage" : "LIMIT",
                        "limitAmount" : 10,
                        "inputStage" : {
                            "stage" : "SKIP",
                            "skipAmount" : 7519340,
                            "inputStage" : {
                                "stage" : "FETCH",
                                "inputStage" : {
                                    "stage" : "SHARDING_FILTER",
                                    "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "keyPattern" : {
                                            "incidentOn" : -1.0
                                        },
                                        "indexName" : "incidentOn_-1",
                                        "isMultiKey" : false,
                                        "multiKeyPaths" : {
                                            "incidentOn" : []
                                        },
                                        "isUnique" : false,
                                        "isSparse" : false,
                                        "isPartial" : false,
                                        "indexVersion" : 2,
                                        "direction" : "forward",
                                        "indexBounds" : {
                                            "incidentOn" : [ 
                                                "(new Date(1516342419000), new Date(1513641600000)]"
                                            ]
                                        }
                                    }
                                }
                            }
                        }
                    },
                    "rejectedPlans" : []
                }
            ]
        }
    },
    "ok" : 1.0
}

Explain query result (against mongo shard):

    {
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "mydatabase.rptDlp",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [ 
                {
                    "incidentOn" : {
                        "$lt" : ISODate("2018-01-19T06:13:39.000Z")
                    }
                }, 
                {
                    "incidentOn" : {
                        "$gte" : ISODate("2017-12-19T00:00:00.000Z")
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "LIMIT",
            "limitAmount" : 10,
            "inputStage" : {
                "stage" : "SKIP",
                "skipAmount" : 7519340,
                "inputStage" : {
                    "stage" : "FETCH",
                    "inputStage" : {
                        "stage" : "IXSCAN",
                        "keyPattern" : {
                            "incidentOn" : -1.0
                        },
                        "indexName" : "incidentOn_-1",
                        "isMultiKey" : false,
                        "multiKeyPaths" : {
                            "incidentOn" : []
                        },
                        "isUnique" : false,
                        "isSparse" : false,
                        "isPartial" : false,
                        "indexVersion" : 2,
                        "direction" : "forward",
                        "indexBounds" : {
                            "incidentOn" : [ 
                                "(new Date(1516342419000), new Date(1513641600000)]"
                            ]
                        }
                    }
                }
            }
        },
        "rejectedPlans" : []
    },
    "serverInfo" : {
        "host" : "UemCent7x64-69",
        "port" : 27017,
        "version" : "3.4.10",
        "gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
    },
    "ok" : 1.0
}

Any suggestion would be helpful, thanks in advance.

FINDINGS :

While we execute skip() query on mongos(routers) and on shard it behaves differently.

When executing the skip(n) and limit(m) on shard, it actually skips the 'n' number of record and only returns 'm' records mentioned in limit. But this is not possible through mongos, because it may possible that data is divided on multiple shards and due to which shard may contains less than 'n' number of records(mentioned in skip). Hence instead of applying skip(n) query, mongos will execute limit(n+m) query on shard by adding skip count n and limit count m to collect all records. After collecting results from all shard mongos will apply skip on assembled records. Also if data is huge mongos fetches that data in chunks by using getMore command, which also slow down the performance.

As per mongo doc reference from : https://docs.mongodb.com/v3.0/core/sharded-cluster-query-router/ If the query limits the size of the result set using the limit() cursor method, the mongos instance passes that limit to the shards and then re-applies the limit to the result before returning the result to the client.

If the query specifies a number of records to skip using the skip() cursor method, the mongos cannot pass the skip to the shards, but rather retrieves unskipped results from the shards and skips the appropriate number of documents when assembling the complete result. However, when used in conjunction with a limit(), the mongos will pass the limit plus the value of the skip() to the shards to improve the efficiency of these operations.

Is there any solution to improve skip query performance executed via mongos(routers)?

Thanks in advance.

Can you post explain result of query via Mongos and on primary shard ? — Bùi Đức Khánh
As result is big, i updated the question with explain results. — BabyDuck

Kevin Smith Kevin Smith · Accepted Answer · 2018-01-24T09:59:37

If you have a HASHed shard key you won't be able to use range queries to find which nodes each item is on so it will need to scan all nodes within the sharded cluster. So the slowest query will be the slowest node in the set plus time to aggregate the results on the mongos before sending them back to the client.

Using a HASHed shard key scatters the results throughout the cluster so you'll only be able to query based on a key match.

Check out the documentation here - https://docs.mongodb.com/manual/core/hashed-sharding/

If you don't mind the query doing a full cluster scan then you could make it more efficient by adding a standard index on incidentOn this will make the query a lot faster on each node but still won't be able to pinpoint the nodes in the cluster.

MongoDB Sharding - Query via mongos takes more time than query on mongod (PRIMARY)

1 Answers