I have a collection which contains documents with nested arrays. There are 1 million documents with thousands of objects in subfield array. Documents are quite large, but for purpose of explaining consider the following two documents:
[
{
"id": "myid",
"field": {
"total": 1,
"subfield": [
{
"somefield": "1000",
"time": "2020-08-06T08:33:57.977+0530",
"val": [
{
"x": "someval",
"a": "val1",
"b": "val2"
}
]
},
{
"somefield": "2000",
"time": "2020-05-08T04:13:27.977+0530",
"val": [
{
"x": "someval2",
"a": "val1",
"b": "val2"
}
]
}
]
}
},
{
"id": "myid2",
"field": {
"total": 1,
"subfield": [
{
"somefield": "1001",
"time": "2020-07-31T10:15:50.184+0530",
"val": [
{
"x": "someval2",
"a": "val1",
"b": "val2"
},
{
"x": "someval2",
"a": "val1",
"b": "val2"
}
]
}
]
}
}
]
Use case:
I need to project only the id of documents with time date(grouped by date) greater than a value and field. subfield.val.b or field. subfield.val.a with a particular value.
I have the query to achieve my use case using $unwind,$toDate,$dateToString operators.
But the use of $unwind for large arrays causes the overall set to use a lot of memory and slows things down. It takes more than 15 minutes now.(I have not added any indexes because even if I create indexes for created, when I run explain for aggregation, the winning query does not use the provided index)
My current query:
db.collection.aggregate([
{
$unwind: {
path: "$field.subfield",
}
},
{
$unwind: {
path: "$field.subfield.val",
}
},
{
$addFields: {
created_at: {
$toDate: "$field.subfield.time"
}
}
},
{
$match: {
$and: [
{
$expr: {
$gt: [
{
"$dateToString": {
"date": "$created_at",
"format": "%Y-%m-%d"
}
},
"2020-04-28"
]
}
},
{
$or: [
{
"field.subfield.val.a": {
"$eq": "val1"
}
},
{
"field.subfield.val.b": {
"$eq": "val1"
}
}
]
}
]
}
},
{
$group: {
_id: "$id"
}
}
])
I need to limit the execution time of the query to less than 30 seconds. I hope the process can be made faster if done without $unwind.
My MongoDB Server version is 4.0.3
What are the other possible optimisations that can be done?
Thanks!
toandfromyou can directly match/query first. second suggestion is why can't you convert this date"2020-04-28"to iso date in client side and then match. - turivishal