1
votes

has anybody experienced issues with the aggregate pipeline? Specifically using the Mongo API? I have quite large query with a few stages, the $group stage is not able to find a particular field in the output of the previous $project stage although if I comment out the $group stage I can definitely see the field I am expecting being output as part of the $project.

The following sets up two collections, each with a single document for demonstration purposes.

db.vacancies.insert({
    "_id" : NUUID("35b3068c-d300-4ae1-bf45-cb00f0f7c37b"),
    "employerAccountId" : "MYJR4X",
    "vacancyReference" : NumberLong(1000000021),
    "status" : "Closed",
    "applicationMethod" : "ThroughPhone",
    "closingDate" : ISODate("2019-03-01T00:00:00.000Z")
});

db.applicationReviews.insert({
    "_id" : NUUID("43a7764a-7d4d-465b-9c30-9934848c009c"),
    "candidateId" : NUUID("84dc9645-605c-4606-812b-117b090020b0"),
    "vacancyReference" : NumberLong(1000000021),
    "status" : "New",
    "createdDate" : ISODate("2018-09-27T13:46:09.801Z")
});

The following is the aggregate pipeline query:

db.vacancies.aggregate([
   {
      "$match":{
         "employerAccountId":"MYJR4X"
      }
   },
   {
      "$lookup":{
         "from":"applicationReviews",
         "localField":"vacancyReference",
         "foreignField":"vacancyReference",
         "as":"application"
      }
   },
   {
      "$unwind":{
         "path":"$application",
         "preserveNullAndEmptyArrays":true
      }
   },
   {
      "$project":{
         "vacancyGuid":"$_id",
         "vacancyReference":1,
         "status":1,
         "appStatus":"$application.status",
         "closingDate":1,
         "applicationMethod":1
      }
   },
   {
      "$project":{
         "vacancyGuid":1,
         "vacancyReference":1,
         "status":1,
         "closingDate":1,
         "applicationMethod":1,
         "isNew":{
            "$cond":{
               "if":{
                  "$eq":[
                     "$appStatus",
                     "New"
                  ]
               },
               "then":1,
               "else":0
            }
         },
         "isSuccessful":{
            "$cond":{
               "if":{
                  "$eq":[
                     "$appStatus",
                     "Successful"
                  ]
               },
               "then":1,
               "else":0
            }
         },
         "isUnsuccessful":{
            "$cond":{
               "if":{
                  "$eq":[
                     "$appStatus",
                     "Unsuccessful"
                  ]
               },
               "then":1,
               "else":0
            }
         }
      }
   },
   {
      "$group":{
         "_id":{
            "vacancyGuid":"$vacancyGuid",
            "vacancyReference":"$vacancyReference",
            "status":"$status",
            "applicationMethod": "$applicationMethod",
            "closingDate":"$closingDate"
         },
         "noOfNewApplications":{
            "$sum":"$isNew"
         },
         "noOfSuccessfulApplications":{
            "$sum":"$isSuccessful"
         },
         "noOfUnsuccessfulApplications":{
            "$sum":"$isUnsuccessful"
         }
      }
   }
]);

I get the following result when using MongoDB:

{
    "_id" : {
        "vacancyGuid" : NUUID("35b3068c-d300-4ae1-bf45-cb00f0f7c37b"),
        "vacancyReference" : NumberLong(1000000021),
        "status" : "Closed",
        "applicationMethod" : "ThroughPhone",
        "closingDate" : ISODate("2019-03-01T00:00:00.000Z")
    },
    "noOfNewApplications" : 1.0,
    "noOfSuccessfulApplications" : 0.0,
    "noOfUnsuccessfulApplications" : 0.0
}

The above is what I expect, however when using Cosmos DB, either hosted on Azure or the Azure Cosmos Emulator I get the following result:

{
    "_id" : {
        "vacancyGuid" : NUUID("35b3068c-d300-4ae1-bf45-cb00f0f7c37b"),
        "vacancyReference" : 1000000021,
        "status" : "Closed",
        "closingDate" : ISODate("2019-03-01T00:00:00.000Z")
    },
    "noOfNewApplications" : 1.0,
    "noOfSuccessfulApplications" : 0.0,
    "noOfUnsuccessfulApplications" : 0.0
}

^^ note the applicationMethod field is not returned.

I have tried the following: - commenting out/removing the group phase of the aggregate pipeline - the projection phases return applicationMethod correctly - aliasing the applicationMethod field in the projection stages to another name e.g. am, didn't make a difference to the grouping. - changing the name of the aliased collection in the as field of the lookup phase to something other than application

I have the following assertions:

  • application is a reserved keyword, which has an effect even when used as a prefix to a field name.
  • application is a reserved keyword in Cosmos only (Is this because of the Application reserved word in SQL? (SQL API))

I know calling my source field on the vacancies collection something other than applicationMethod would solve the issue but that is not practical for existing documents and also the naming I would like to continue using. It seems there is a problem with Cosmos DB and the way it handles the group stage of the aggregate pipeline.

1

1 Answers

1
votes

After raising this with MS they suggested that I start the name of the aliased lookup with something other than application so I chose candidateApplication and then the applicationMethod field I wanted in the group stage response appeared.

They have filed it as something they will correct in the future but the takeaway is make sure that the alias is not called something that is even a partial match on any of the field names you would use in the $group phase of the aggregate pipeline.