3
votes

I am struggling with an aggregation in mongodb. I have the following type of documents:

{
    "_id": "xxxx",
    "workHome": true,
    "commute": true,
    "tel": false,
    "weekend": true,
    "age":39
},
{
    "_id": "yyyy",
    "workHome": false, 
    "commute": true, 
    "tel": false, 
    "weekend": true,
    "age":32
},
{
    "_id": "zzzz",
    "workHome": false,
    "commute": false,
    "tel": false,
    "weekend": false,
    "age":27
}

Out of this I want to generate an aggregation by the total number of fields that are "true" in the document. There are a total of 4 boolean fields in the document so I want the query to group them together to generate the following output (as examples from e.g. a collection with 100 documents in total):

0:20
1:30
2:10
3:20
4:20

This means: There is 20 documents out of 100 with 'all false', 30 documents with '1x true', 10 documents with '2x true' etc. up to a total of 'all 4 are true'.

Is there any way to do this with an $aggregate statement? Right now I am trying to $group by the $sum of 'true' values but don't find a way to get the conditional query to work.

1
First post and though you have made some attempt to work towards an anwer, I suggest you read How to ask the perfect question. Why? really nice you gave us some data so a "plus" there. Also really nice you gave an expected outcome, so another "plus" there. But what does not work here is the lack of explanation of how the "outcome" is achieved from the "input" sample you provide. A good question ( not necessarily perfect ) at least has some correlation between the two points there. Use the "edit" link. - Blakes Seven
Does the example output mean 0th document has 20 fields that has true? Should total 100 documents output 100 lines? - suztomo
@gonbe: thanks for the input, edited the question. It should be a $group output, i.e. there is 20 documents out of 100 with 'all false', 30 documents with '1x true', 10 documents with '2x true' etc. up to a total of 'all 4 are true'. - user2650492

1 Answers

1
votes

So assuming that the data is consistent with all the same fields as "workHome", "commute", "tel" and "weekend", then you would proceed with a "logical" evaluation such as this:

db.collection.aggregate([
  { "$project": {
    "mapped": { "$map": {
      "input": ["A","B","C","D"],
      "as": "el",
      "in": { "$cond": [
        { "$eq": [ "$$el", "A" ] },
        "$workHome",
        { "$cond": [
          { "$eq": [ "$$el", "B" ] },
          "$commute",
          { "$cond": [
            { "$eq": [ "$$el", "C" ] },
            "$tel",
            "$weekend"
          ]}
        ]}
      ]}
    }}
  }},
  { "$unwind": "$mapped" },
  { "$group": {
    "_id": "$_id",
    "size": { "$sum": { "$cond": [ "$mapped", 1, 0 ] } }
  }},
  { "$group": {
      "_id": "$size",
      "count": { "$sum": 1 }
  }},
  { "$sort": { "_id": 1 } }
])

From your simple sample this gives:

{ "_id" : 0, "count" : 1 }
{ "_id" : 2, "count" : 1 }
{ "_id" : 3, "count" : 1 }

To break this down, first the $map operator here transposes the values of the fields to an array of the same lenght as the fields themselves. This is done my comparing each element of the "input" to an expected value via $cond and either returning the true condtion where a match, or moving on to the next condition embedded in the false part of this "ternary" operator. This is done until all logical matches are met and results in an array of values from the fields like so, for the first document:

[true,true,false,true]

The next step is to $unwind the array elements for further comparison. This "de-normalizes" into separate documents for each array element, and is usually required in aggregation pipelines when processing arrays.

Once that is done a $group pipeline stage is invoked, in order to assess the "total" of those elements with a true value. The same $cond ternary is used to transform the logical true/falsecondtions into numeric values here and fed to the $sum accumulator for addition.

Since the "grouping key" provided in _id in the $group is the original document _id value, the current totals are per document for those fields that are true. In order to get totals on the "counts" over the whole collection ( or selection ) then the futher $group stage is invoked with the grouping key being the returned "size" of the matched true results from each document.

The $sum accumulator used there simply adds 1 for each match on the grouping key, thus "counting" the number of occurances of each match count.

Finally $sort by the number of matches "key" in to produce some order to the results.


For the record, this is so much nicer with the upcoming release of MongoDB ( as of writing ) which includes the $filter operator:

db.collection.aggregate([
  { "$group": {
    "_id": {
      "$size": {
        "$filter": {
          "input": { "$map": {
            "input": ["A","B","C","D"],
            "as": "el",
            "in": { "$cond": [
              { "$eq": [ "$$el", "A" ] },
              "$workHome",
              { "$cond": [
                { "$eq": [ "$$el", "B" ] },
                "$commute",
                { "$cond": [
                  { "$eq": [ "$$el", "C" ] },
                  "$tel",
                  "$weekend"
                ]}
              ]}
            ]}
          }},
          "as": "el",
          "cond": {
            "$eq": [ "$$el", true ]
          }
        }
      }
    },
    "count": { "$sum": 1 }
  }},  
  { "$sort": { "_id": 1 } }
])

So now just "two" pipeline stages doing the same thing as the original statement that will work from MongoDB 2.6 and above.

Therefore if your own application is in "development" itself, or you are otherwise curious, then take a look at the Development Branch releases where this functionality is available now.