1
votes

If I have a set of objects each with the same description, but with different amounts.

{

    {
    "_id": "101",
    "description": "DD from my employer1",
    "amount": 1000.33
    },
    {
    "_id": "102",
    "description": "DD from my employer1",
    "amount": 1000.34
    },
    {
    "_id": "103",
    "description": "DD from my employer1",
    "amount": 1000.35
    },
    {
    "_id": "104",
    "description": "DD from employer1",
    "amount": 5000.00
    },
    {
    "_id": "105",
    "description": "DD from my employer2",
    "amount": 2000.33
    },
    {
    "_id": "106",
    "description": "DD from my employer2",
    "amount": 2000.33
    },
    {
    "_id": "107",
    "description": "DD from my employer2",
    "amount": 2000.33
    }

}

Below, I am able to group them using the description:

{

    {
        "$group": {
            "_id": {
                "description": "$description"
            },
            "count": {
                "$sum": 1
            },
            "_id": {
                "$addToSet": "$_id"
            }
        }
    },
    {
        "$match": {
            "count": {
                "$gte": 3
            }
        }
    }

}

Is there a way to include all the amounts in the group (_ids: 101, 102, and 103 plus 105,106,107) even if they have a small difference, but exclude the bonus amount, which in the sample above is _id 104?

I don't believe it could be done in a group stage, but is there something that could be done at a later stage that could group _ids 101, 102 and 103 together and exclude _id 104. Basically, I want MongoDB to ignore the small differences in 101, 102, 103 and group them together since the are paychecks coming from the same employer.

I have been working with $stdDevPop, but can't get a solid formula down.

I am looking for a simple array output of just the _ids.

{ "result": [ "101", "102", "103", "105", "106", "107" ] }

2
You can group by doing some math on the "amount", i.e round out to 1000's and $push the matching _id values to an array. However, presently your "amount" values are "strings". So you either fix that data by converting to a numeric value or live with the JavaScript evaluation of mapReduce. The aggregation framework operators cannot cast a string to numeric. But JavaScript can. Still, you "should" really fix your data. - Neil Lunn
Values are numeric. Mistakenly shown as strings. - Pablo.K
@NeilLunn Thanks. I used the $trunc and then grouped by description and that truncated value and it is doing what I want it to do. - Pablo.K
Actually was just finishing writing up an answer as you typed that. - Neil Lunn

2 Answers

0
votes
db.yourDBNameHere.aggregate( [
  { $match: { "amount" : { $lt : 5000 } } },
  { $project: { _id: 1 } },
])

that will grab the ID only of every transaction less than 5000$.

0
votes

You can do this by doing some math on the "amount" to round it down to the nearest 1000 and use that as the grouping _id:

db.collection.aggregate([
  { "$group": {
      "_id": {
        "$subtract": [
           { "$trunc": "$amount" },
           { "$mod": [
             { "$trunc": "$amount" }, 
             1000
           ]}
         ]
      },
      "results": { "$push": "$_id" }
  }},
  { "$redact": {
    "$cond": {
      "if": { "$gt": [ { "$size": "$results" }, 1 ] },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$unwind": "$results" },
  { "$group": {
    "_id": null,
    "results": { "$push": "$results" }
  }}
])

If your MongoDB is older than 3.2 then you would just need to use a long form with $mod of what $trunc is doing. And if your MongoDB is older than 2.6 then rather than $redact you would $match. So in the longer form this is:

db.collection.aggregate([
  { "$group": {
      "_id": {
        "$subtract": [
           { "$subtract": [
             "$amount",
             { "$mod": [ "$amount", 1 ] }
           ]},
           { "$mod": [
             { "$subtract": [
               "$amount",
               { "$mod": [ "$amount", 1 ] }
             ]},
             1000
           ]}
         ]
      },
      "results": { "$push": "$_id" },
      "count": { "$sum": 1 }
  }},
  { "$match": { "count": { "$gt": 1 } } },
  { "$unwind": "$results" },
  { "$group": {
    "_id": null,
    "results": { "$push": "$results" }
  }}
])

Either way the output is just the _id values whose amounts grouped to the boundaries with a count more than once.

{ "_id" : null, "results" : [ "105", "106", "107", "101", "102", "103" ] }

You could either add a $sort in there or live with sorting the result array in client code.