Perhaps somebody has a good idea how we can fix or workaround something which looks like a bug in the current implementation of Aggregation Pipelines for MongoDB in Azure Cosmos DB (so yes: We have switched on the feature on our instances).
Short version is: It seems to us that a $match
aggregation stage after a $group
stage does not work. It never returns any results.
This can be reproduced in the following way using any Mongo Console, assuming you are already in a valid DB (using use <some db>
):
Paste the following JavaScript (this line is optional if you created the collection via the Azure Portal; it assumes your collection is called bug
)
db.createCollection("bug");
The add some documents to the collection:
db.bug.insert({ _id: 1, owner: "a", _class: "History" });
db.bug.insert({ _id: 2, owner: "a", _class: "History" });
db.bug.insert({ _id: 3, owner: "a", _class: "DocumentBookmark" });
db.bug.insert({ _id: 4, owner: "a", _class: "Recyclebin" });
db.bug.insert({ _id: 5, owner: "b", _class: "History" });
As you can see, the owner: "a"
has a duplicate History
record, which we want to query for.
Now do the following:
db.bug.aggregate([
{ $match: { _class: "History"} }
]);
This renders the correct result:
globaldb:PRIMARY> db.bug.aggregate([
... { $match: { _class: "History"} }
... ]);
{
"_t" : "AggregationPipelineResponse",
"ok" : 1,
"waitedMS" : NumberLong(0),
"result" : [
{
"_id" : 1,
"owner" : "a",
"_class" : "History"
},
{
"_id" : 2,
"owner" : "a",
"_class" : "History"
},
{
"_id" : 5,
"owner" : "b",
"_class" : "History"
}
]
}
Now add a $group
stage with a count
to find the number of records per owner:
db.bug.aggregate([
{ $match: { _class: "History"} },
{ $group: { _id: "$owner", count: { $sum: 1 }}}
]);
This also returns a correct result:
globaldb:PRIMARY> db.bug.aggregate([
... { $match: { _class: "History"} },
... { $group: { _id: "$owner", count: { $sum: 1 }}}
... ]);
{
"_t" : "AggregationPipelineResponse",
"ok" : 1,
"waitedMS" : NumberLong(0),
"result" : [
{
"_id" : "a",
"count" : NumberLong(2)
},
{
"_id" : "b",
"count" : NumberLong(1)
}
]
}
Now we want to match for records with a count
greater than 1:
db.bug.aggregate([
{ $match: { _class: "History"} },
{ $group: { _id: "$owner", count: { $sum: 1 }}},
{ $match: { count: { $gt: 1 }}}
]);
This returns an empty result set:
globaldb:PRIMARY> db.bug.aggregate([
... { $match: { _class: "History"} },
... { $group: { _id: "$owner", count: { $sum: 1 }}},
... { $match: { count: { $gt: 1 }}}
... ]);
{
"_t" : "AggregationPipelineResponse",
"ok" : 1,
"waitedMS" : NumberLong(0),
"result" : [ ]
}
The same on Mongo DB
Now, to verify these queries are actually correct, I tried it using the mongo:3.4
docker image. The following code will start a new Mongo DB instance on your local machine, so that you can try it out yourself:
$ docker run --name mongobug -d mongo:3.4
ad3010da255b7c15a464fa21ff6519799a5c16cb8af62a0ea564a95780900491
$ docker exec -it mongobug mongo
MongoDB shell version v3.4.10
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.10
Welcome to the MongoDB shell.
>
Then we'll do the same as with Cosmos above; inside the Mongo Shell, run the following commands:
db.createCollection("bug")
Then insert the test data:
db.bug.insert({ _id: 1, owner: "a", _class: "History" });
db.bug.insert({ _id: 2, owner: "a", _class: "History" });
db.bug.insert({ _id: 3, owner: "a", _class: "DocumentBookmark" });
db.bug.insert({ _id: 4, owner: "a", _class: "Recyclebin" });
db.bug.insert({ _id: 5, owner: "b", _class: "History" });
And now you can see that the query which returned an empty set in fact returns a non-empty aggregate result when running the following aggregate query:
db.bug.aggregate([
{ $match: { _class: "History"} },
{ $group: { _id: "$owner", count: { $sum: 1 }}},
{ $match: { count: { $gt: 1 }}}
]);
The result is the expected:
> db.bug.aggregate([
... { $match: { _class: "History"} },
... { $group: { _id: "$owner", count: { $sum: 1 }}},
... { $match: { count: { $gt: 1 }}}
... ]);
{ "_id" : "a", "count" : 2 }
Extra bonus reading
I also tried to first group by owner
and _class
in a joint group and then $match
; this is obviously by far a much more expensive operation, as Mongo has to group over the entire collection, and not only over already filtered items.
But, unfortunately, this also rendered an empty result, while it worked on the local Mongo docker image:
db.bug.aggregate([
{ $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
{ $match: { "_id._class": "History", count: { $gt: 1 } } }
]);
Result on Cosmos:
globaldb:PRIMARY> db.bug.aggregate([
... { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
... { $match: { "_id._class": "History", count: { $gt: 1 } } }
... ]);
{
"_t" : "AggregationPipelineResponse",
"ok" : 1,
"waitedMS" : NumberLong(0),
"result" : [ ]
}
Result on Mongo DB:
> db.bug.aggregate([
... { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
... { $match: { "_id._class": "History", count: { $gt: 1 } } }
... ]);
{ "_id" : { "owner" : "a", "_class" : "History" }, "count" : 2 }
Weird.
tl;dr
Does Cosmos DB have a bug which does not allow for running $match
aggregates after $group
stages?