MongoDB - Select the group in aggregation without specifying fields

Question

Apologies for the long post!

I have a Mongo collection with the following documents:

{
    "_id" : ObjectId("592811e3fab9f74b07139d73"),
    "Name" : "John",
    "Value" : 1,
    "AnotherValue": "12345"
},
{
    "_id" : ObjectId("592811f8fab9f74b07139d78"),
    "Name" : "John",
    "Value" : 5,
    "AnotherValue": "55555"
},
{
    "_id" : ObjectId("59281206fab9f74b07139d7e"),
    "Name" : "John",
    "Value" : 12,
    "AnotherValue": "654321"

},
{
    "_id" : ObjectId("59281217fab9f74b07139d81"),
    "Name" : "Chris",
    "Value" : 3,
    "AnotherValue": "11111"
},
{
    "_id" : ObjectId("59281223fab9f74b07139d85"),
    "Name" : "Steve",
    "Value" : 2,
    "AnotherValue": "22222"
},
{
    "_id" : ObjectId("5928122ffab9f74b07139d87"),
    "Name" : "Steve",
    "Value" : 4,
    "AnotherValue": "33333"
}

I want to query these documents and return the entry for each name with the highest value, so my desired result set (the order doesn't matter) is:

{
    "_id" : ObjectId("59281206fab9f74b07139d7e"),
    "Name" : "John",
    "Value" : 12,
    "AnotherValue": "654321"
},
{
    "_id" : ObjectId("59281217fab9f74b07139d81"),
    "Name" : "Chris",
    "Value" : 3,
    "AnotherValue": "11111"
},
{
    "_id" : ObjectId("5928122ffab9f74b07139d87"),
    "Name" : "Steve",
    "Value" : 4,
    "AnotherValue": "33333"
}

If I wanted to do exactly the same thing in C# I would use:

var result = 
    from item in collection
    orderby item.Value descending
    group item by item.Name into itemGroup
    select itemGroup.First();

Using the aggregation pipeline I have got as far as:

db.getCollection('test').aggregate(
[
    { "$sort" : { "Value" : -1 } }, //sort descendingly by the Value field
    { "$group" : { "_id" : "$Name", "highest" : { "$first" : "$$ROOT" }  } }, //group by name and select the first document in the group (as they are sorted descendingly, this will be the document with the highest value)
])

This gives me the following result set:

{
    "_id" : "Steve",
    "highest" : {
        "_id" : ObjectId("5928122ffab9f74b07139d87"),
        "Name" : "Steve",
        "Value" : 4,
        "AnotherValue": "33333"
    }
},
{
    "_id" : "Chris",
    "highest" : {
        "_id" : ObjectId("59281217fab9f74b07139d81"),
        "Name" : "Chris",
        "Value" : 3,
        "AnotherValue": "11111"
    }
},
{
    "_id" : "John",
   "highest" : {
        "_id" : ObjectId("59281206fab9f74b07139d7e"),
        "Name" : "John",
        "Value" : 12,
        "AnotherValue": "654321"
    }
}

As you can see, I have an array of documents, each containing an "_id" field which is the name and a "highest" field which is the actual document.

This would be represented in C# as:

var result = 
    from item in collection
    orderby item.Value descending
    group item by item.Name into itemGroup
    select new { id = itemGroup.Key, highest = itemGroup.First() };

What I want to know, is it possible to add another step to my pipeline to ensure I only select the actual person document, rather than a group document which contains the person document, and can I do this WITHOUT specifying the fields? I am hoping to write a C# class that will be able to use this query for various different types of object so the fields might not be known (assume that every collection I might want to use this query for DOES have Name and Value fields, they will all have some common properties).

If I'm coming at this in the completely the wrong way then I'm open to totally new suggestions. As long as I get the desired result set at the end I'll be happy.

Thanks in advance for any help.

Aggregations don't scale because they don't play well with sharding. — arboreal84
If you have MongoDB 3.4 you could use $replaceRoot, otherwise you need to specify all the fields with $project. So upgrade if that's a must. But is it really that bad? I see in either case the trade-off being the cost of running through results once more in an aggregation pipeline, or simply processing each returned result in client code. For such trivial usage, I'd just be doing it in client code. — Neil Lunn
$replaceRoot has achieved exactly what I was after. Thank you! — lace.john

lace.john lace.john · Accepted Answer · 2017-05-26T13:21:28

Huge thanks to Neil Lunn who has answered my question in the comments.

https://docs.mongodb.com/manual/reference/operator/aggregation/replaceRoot/

MongoDB 3.4 has a $replaceRoot pipeline option that achieves exactly what I need:

db.getCollection('test').aggregate(
[
    { "$sort" : { "Value" : -1 } }, //sort descendingly by the Value field
    { "$group" : { "_id" : "$Name", "highest" : { "$first" : "$$ROOT" }  } }, //group by name and select the first document in the group (as they are sorted descendingly, this will be the document with the highest value)
    { "$replaceRoot": { newRoot: "$highest" } }
])

Result set:

{
    "_id" : ObjectId("5928122ffab9f74b07139d87"),
    "Name" : "Steve",
    "Value" : 4
},
{
    "_id" : ObjectId("59281217fab9f74b07139d81"),
    "Name" : "Chris",
    "Value" : 3
},
{
     "_id" : ObjectId("59281206fab9f74b07139d7e"),
    "Name" : "John",
    "Value" : 12
}

MongoDB - Select the group in aggregation without specifying fields

1 Answers