1
votes

The $group in mongo mostly seems to be used to group by values that are same. For these mongo documents:

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-03-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-03-01T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-03-15T09:00:00Z") }
{ "_id" : 4, "item" : "xyz", "price" : 5, "quantity" : 20, "date" : ISODate("2014-04-04T11:21:39.736Z") }
{ "_id" : 5, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-04-04T21:23:13.331Z") }

{ $group : { _id : "$price" } }

will get me three documents with prices 5, 10, 20. But what if want to extend the definition of group to something more than "equals". What if I have a dictionary of the following structure:

{
    "tens" : [10,20,30,40],
    "fives" : [5,15,25,35]
}

and want to group all documents with prices in "tens" array together and "fives" array into another document. Something like:

{ "_id" : "fives", "quantity" : 30 }
{ "_id" : "tens" , "quantity" : 13 }

Any way to achieve that using the aggregation pipeline without having to resort to Map-Reduce?

1
Very unclear what you are expecting here. Perhaps you should edit to show the output you expect. - Blakes Seven
Added expected output - ujjwal-gupta
You can put any expression in _id field, it will be a transformation outputting your distinct group key. So here why don't you use just {_id: {$mod: ["$price", 10]}, quantity: {$sum: "$quantity"}}? The aggregation should result: { "_id" : 5, "quantity" : 30 }, { "_id" : 0 , "quantity" : 13 }. I do not post it as an answer because I cannot test it right now. - dgiugg
Sorry this was just an example, values have no mathematical correlation. I just have list of random values than need to be grouped as one. Basically explicitly utilizing the dictionary structure mentioned above for grouping purposes. - ujjwal-gupta

1 Answers

2
votes

Well you could write something like this:

db.collection.aggregate([
    { "$group": {
        "_id": {
            "$cond": [
                { "$or": [
                    { "$eq": [ "$price", 5 ] },
                    { "$eq": [ "$price", 15 ] },
                    { "$eq": [ "$price", 25 ] },
                    { "$eq": [ "$price", 35 ] }
                ]},
                "fives",
                { "$cond": [
                    { "$or": [
                        { "$eq"; [ "$price", 10 ] },
                        { "$eq": [ "$price", 20 ] },
                        { "$eq": [ "$price", 30 ] },
                        { "$eq": [ "$price", 40 ] }
                    ]}
                ]},
                "tens",
                null
            ]
        },
        "quantity": { "$sum": "$quantity" }
    }}
])

Which is essentially going to look at the values and determine whether they match "fives" or "tens" for grouping purposes. You can even construct that structure programatically with reasonable ease from your source list.

But it's probably still a little terse, and if you just mean values ending in "fives" or "tens" then just do:

db.collection.aggregate([
    { "$group": {
        "_id": {
            "$cond": [
                { "$eq": [ { "$mod": [ { "$divide": [ "$price", 5 ] }, 2 ] }, 0 ] },
                "tens",
                "fives"
            ]
        },
        "quantity": { "$sum": "$quantity" }
    }}
])

Or a bit more complex math and condition to detect things that aren't a "ten" or a "five" depending on the content. Works fine on the sample you have, but you would need to tweak it and decide what to do with other values like 27, being either round them or just ignore them. ( hint: modulo of a number after division by 5 and 1 not equal to 0 means it was not cleanly divisible by 5 )

The basic case is the $cond ternary operator, which is an ( if/then/else ) condition evaluated from the first argument ( if ) to return the second where true ( then ) or the third where false ( else ).

So you just need a logical condition to evaluate "fives" or "tens", then return that value for the grouping key and the modulo ( $mod ) logic where the current price divided by "five" is either a odd/even number depending on the remainder from division by "two".

So $cond is the key to evaluating a condition and returning the appropriate grouping key. Also look at the math operators for the aggregation framework as well as the general operator list to get familiar with the usages for these.

There are quite a lot of things you can do without using mapReduce, and you should be using them as long as it is possible.