Setup Details:
mongos: RAM: 8 GB, CPUs: 2
Config Servers (Replica set of 3 config servers): RAM: 4 GB, CPUs: 2
Shard Cluster-1 (Replica of 3 mongod): RAM: 30 GB, CPUs: 4
Shard Cluster-2 (Replica of 3 mongod): RAM: 30 GB, CPUs: 4
Sharding: Collection: rptDlp, Key: {incidentOn: "hashed"}
Description:
I have more than 15 million records in a collection. I am retrieving last page documents having sorted by a field(indexed one) of type date.
Actual Query:
db.getCollection("rptDlp").find({ incidentOn: { $gte: new Date(1513641600000), $lt: new Date(1516233600000) } })
.sort({ incidentOn: -1 }).skip(15610600).limit(10)
If I execute this query directly against mongo shard server (PRIMARY), it shows result in 14 seconds. But through mongos, it takes more than 2 minutes and due to query timeout my application results in showing an error prompt.
If we assume it as network congestion, then every query should take 2 minutes. But when i retrieve documents for first page it shows result in few seconds.
Explain query result (against mongos):
{
"queryPlanner" : {
"mongosPlannerVersion" : 1,
"winningPlan" : {
"stage" : "SHARD_MERGE_SORT",
"shards" : [
{
"shardName" : "rs0",
"connectionString" : "rs0/172.18.64.47:27017,172.18.64.48:27017,172.18.64.53:27017",
"serverInfo" : {
"host" : "UemCent7x64-70",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {a
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
},
"rejectedPlans" : []
},
{
"shardName" : "rs1",
"connectionString" : "rs1/172.18.64.54:27017",
"serverInfo" : {
"host" : "UemCent7x64-76",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "SHARDING_FILTER",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
}
},
"rejectedPlans" : []
}
]
}
},
"ok" : 1.0
}
Explain query result (against mongo shard):
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydatabase.rptDlp",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"incidentOn" : {
"$lt" : ISODate("2018-01-19T06:13:39.000Z")
}
},
{
"incidentOn" : {
"$gte" : ISODate("2017-12-19T00:00:00.000Z")
}
}
]
},
"winningPlan" : {
"stage" : "LIMIT",
"limitAmount" : 10,
"inputStage" : {
"stage" : "SKIP",
"skipAmount" : 7519340,
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"incidentOn" : -1.0
},
"indexName" : "incidentOn_-1",
"isMultiKey" : false,
"multiKeyPaths" : {
"incidentOn" : []
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"incidentOn" : [
"(new Date(1516342419000), new Date(1513641600000)]"
]
}
}
}
}
},
"rejectedPlans" : []
},
"serverInfo" : {
"host" : "UemCent7x64-69",
"port" : 27017,
"version" : "3.4.10",
"gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9"
},
"ok" : 1.0
}
Any suggestion would be helpful, thanks in advance.
FINDINGS :
While we execute skip() query on mongos(routers) and on shard it behaves differently.
When executing the skip(n) and limit(m) on shard, it actually skips the 'n' number of record and only returns 'm' records mentioned in limit. But this is not possible through mongos, because it may possible that data is divided on multiple shards and due to which shard may contains less than 'n' number of records(mentioned in skip). Hence instead of applying skip(n) query, mongos will execute limit(n+m) query on shard by adding skip count n and limit count m to collect all records. After collecting results from all shard mongos will apply skip on assembled records. Also if data is huge mongos fetches that data in chunks by using getMore command, which also slow down the performance.
As per mongo doc reference from : https://docs.mongodb.com/v3.0/core/sharded-cluster-query-router/ If the query limits the size of the result set using the limit() cursor method, the mongos instance passes that limit to the shards and then re-applies the limit to the result before returning the result to the client.
If the query specifies a number of records to skip using the skip() cursor method, the mongos cannot pass the skip to the shards, but rather retrieves unskipped results from the shards and skips the appropriate number of documents when assembling the complete result. However, when used in conjunction with a limit(), the mongos will pass the limit plus the value of the skip() to the shards to improve the efficiency of these operations.
Is there any solution to improve skip query performance executed via mongos(routers)?
Thanks in advance.