3
votes

Perhaps somebody has a good idea how we can fix or workaround something which looks like a bug in the current implementation of Aggregation Pipelines for MongoDB in Azure Cosmos DB (so yes: We have switched on the feature on our instances).

Short version is: It seems to us that a $match aggregation stage after a $group stage does not work. It never returns any results.

This can be reproduced in the following way using any Mongo Console, assuming you are already in a valid DB (using use <some db>):

Paste the following JavaScript (this line is optional if you created the collection via the Azure Portal; it assumes your collection is called bug)

db.createCollection("bug");

The add some documents to the collection:

db.bug.insert({ _id: 1, owner: "a", _class: "History" });
db.bug.insert({ _id: 2, owner: "a", _class: "History" });
db.bug.insert({ _id: 3, owner: "a", _class: "DocumentBookmark" });
db.bug.insert({ _id: 4, owner: "a", _class: "Recyclebin" });
db.bug.insert({ _id: 5, owner: "b", _class: "History" });

As you can see, the owner: "a" has a duplicate History record, which we want to query for.

Now do the following:

db.bug.aggregate([
    { $match: { _class: "History"} }
]);

This renders the correct result:

globaldb:PRIMARY> db.bug.aggregate([
...     { $match: { _class: "History"} }
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [
        {
            "_id" : 1,
            "owner" : "a",
            "_class" : "History"
        },
        {
            "_id" : 2,
            "owner" : "a",
            "_class" : "History"
        },
        {
            "_id" : 5,
            "owner" : "b",
            "_class" : "History"
        }
    ]
}

Now add a $group stage with a count to find the number of records per owner:

db.bug.aggregate([
    { $match: { _class: "History"} },
    { $group: { _id: "$owner", count: { $sum: 1 }}}
]);

This also returns a correct result:

globaldb:PRIMARY> db.bug.aggregate([
...     { $match: { _class: "History"} },
...     { $group: { _id: "$owner", count: { $sum: 1 }}}
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [
        {
            "_id" : "a",
            "count" : NumberLong(2)
        },
        {
            "_id" : "b",
            "count" : NumberLong(1)
        }
    ]
}

Now we want to match for records with a count greater than 1:

db.bug.aggregate([
    { $match: { _class: "History"} },
    { $group: { _id: "$owner", count: { $sum: 1 }}},
    { $match: { count: { $gt: 1 }}}
]);

This returns an empty result set:

globaldb:PRIMARY> db.bug.aggregate([
...     { $match: { _class: "History"} },
...     { $group: { _id: "$owner", count: { $sum: 1 }}},
...     { $match: { count: { $gt: 1 }}}
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [ ]
}

The same on Mongo DB

Now, to verify these queries are actually correct, I tried it using the mongo:3.4 docker image. The following code will start a new Mongo DB instance on your local machine, so that you can try it out yourself:

$ docker run --name mongobug -d mongo:3.4
ad3010da255b7c15a464fa21ff6519799a5c16cb8af62a0ea564a95780900491
$ docker exec -it mongobug mongo
MongoDB shell version v3.4.10
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.10
Welcome to the MongoDB shell.
>

Then we'll do the same as with Cosmos above; inside the Mongo Shell, run the following commands:

db.createCollection("bug")

Then insert the test data:

db.bug.insert({ _id: 1, owner: "a", _class: "History" });
db.bug.insert({ _id: 2, owner: "a", _class: "History" });
db.bug.insert({ _id: 3, owner: "a", _class: "DocumentBookmark" });
db.bug.insert({ _id: 4, owner: "a", _class: "Recyclebin" });
db.bug.insert({ _id: 5, owner: "b", _class: "History" });

And now you can see that the query which returned an empty set in fact returns a non-empty aggregate result when running the following aggregate query:

db.bug.aggregate([
    { $match: { _class: "History"} },
    { $group: { _id: "$owner", count: { $sum: 1 }}},
    { $match: { count: { $gt: 1 }}}
]);

The result is the expected:

> db.bug.aggregate([
...     { $match: { _class: "History"} },
...     { $group: { _id: "$owner", count: { $sum: 1 }}},
...     { $match: { count: { $gt: 1 }}}
... ]);
{ "_id" : "a", "count" : 2 }

Extra bonus reading

I also tried to first group by owner and _class in a joint group and then $match; this is obviously by far a much more expensive operation, as Mongo has to group over the entire collection, and not only over already filtered items.

But, unfortunately, this also rendered an empty result, while it worked on the local Mongo docker image:

db.bug.aggregate([
    { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
    { $match: { "_id._class": "History", count: { $gt: 1 } } }
]);

Result on Cosmos:

globaldb:PRIMARY> db.bug.aggregate([
...     { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
...     { $match: { "_id._class": "History", count: { $gt: 1 } } }
... ]);
{
    "_t" : "AggregationPipelineResponse",
    "ok" : 1,
    "waitedMS" : NumberLong(0),
    "result" : [ ]
}

Result on Mongo DB:

> db.bug.aggregate([
...     { $group: { _id: { owner: "$owner", _class: "$_class" }, count: { $sum: 1 } } },
...     { $match: { "_id._class": "History", count: { $gt: 1 } } }
... ]);
{ "_id" : { "owner" : "a", "_class" : "History" }, "count" : 2 }

Weird.

tl;dr

Does Cosmos DB have a bug which does not allow for running $match aggregates after $group stages?

2
Just run MongoDB in Azure using Atlas! ;-)Kevin Smith
Well, yeah, okay, might even be an option, but it's apparently not available directly from the marketplace, and I do not want an additional place to bill us. And we need API-based provisioning and scaling. I still hope Cosmos can sort this out.donmartin
Ah right, I've heard that not everything works as you'd expect using MongoDB API.Kevin Smith
Unfortunately not, no. But it works well enough that you assume it does, until it doesn't...donmartin

2 Answers

0
votes

Your observations are correct. Multiple $match stages are not yet supported in Cosmos DB. $match has to be the first stage. One of the short-term workarounds until the support is implemented (besides an obvious - dealing with additional filtering on the client side) is to use $out stage and utilize a temporary collection, on which you can run another aggregation pipeline command with another $match.

1
votes

Wanted to provide an update to this thread since version 3.6 has been released of the Azure Cosmos DB for MongoDB API, the issue with aggregations is now returning the correct result.

Initial Query + Result:

db.coll_01.aggregate([
  { $match: { _class: "History"} }
    ]);

Operation consumed 3.18 RUs
{ "_id" : 1, "owner" : "a", "_class" : "History" }
{ "_id" : 2, "owner" : "a", "_class" : "History" }
{ "_id" : 5, "owner" : "b", "_class" : "History" }

Second Query + Result:

db.coll_01.aggregate([
 { $match: { _class: "History"} },
 { $group: { _id: "$owner", count: { $sum: 1 }}}
   ]);

Operation consumed 3.36 RUs
{ "_id" : "a", "count" : 2 }
{ "_id" : "b", "count" : 1 }

And finally the aggregation query + result:

db.coll_01.aggregate([
 { $match: { _class: "History"} },
 { $group: { _id: "$owner", count: { $sum: 1 }}},
 { $match: { count: { $gt: 1 }}}
  ]);

Operation consumed 3.36 RUs
{ "_id" : "a", "count" : 2 }

Please see for more information: Azure Cosmos DB's API for MongoDB (3.6 version): supported features and syntax