Apologies for the long post!
I have a Mongo collection with the following documents:
{
"_id" : ObjectId("592811e3fab9f74b07139d73"),
"Name" : "John",
"Value" : 1,
"AnotherValue": "12345"
},
{
"_id" : ObjectId("592811f8fab9f74b07139d78"),
"Name" : "John",
"Value" : 5,
"AnotherValue": "55555"
},
{
"_id" : ObjectId("59281206fab9f74b07139d7e"),
"Name" : "John",
"Value" : 12,
"AnotherValue": "654321"
},
{
"_id" : ObjectId("59281217fab9f74b07139d81"),
"Name" : "Chris",
"Value" : 3,
"AnotherValue": "11111"
},
{
"_id" : ObjectId("59281223fab9f74b07139d85"),
"Name" : "Steve",
"Value" : 2,
"AnotherValue": "22222"
},
{
"_id" : ObjectId("5928122ffab9f74b07139d87"),
"Name" : "Steve",
"Value" : 4,
"AnotherValue": "33333"
}
I want to query these documents and return the entry for each name with the highest value, so my desired result set (the order doesn't matter) is:
{
"_id" : ObjectId("59281206fab9f74b07139d7e"),
"Name" : "John",
"Value" : 12,
"AnotherValue": "654321"
},
{
"_id" : ObjectId("59281217fab9f74b07139d81"),
"Name" : "Chris",
"Value" : 3,
"AnotherValue": "11111"
},
{
"_id" : ObjectId("5928122ffab9f74b07139d87"),
"Name" : "Steve",
"Value" : 4,
"AnotherValue": "33333"
}
If I wanted to do exactly the same thing in C# I would use:
var result =
from item in collection
orderby item.Value descending
group item by item.Name into itemGroup
select itemGroup.First();
Using the aggregation pipeline I have got as far as:
db.getCollection('test').aggregate(
[
{ "$sort" : { "Value" : -1 } }, //sort descendingly by the Value field
{ "$group" : { "_id" : "$Name", "highest" : { "$first" : "$$ROOT" } } }, //group by name and select the first document in the group (as they are sorted descendingly, this will be the document with the highest value)
])
This gives me the following result set:
{
"_id" : "Steve",
"highest" : {
"_id" : ObjectId("5928122ffab9f74b07139d87"),
"Name" : "Steve",
"Value" : 4,
"AnotherValue": "33333"
}
},
{
"_id" : "Chris",
"highest" : {
"_id" : ObjectId("59281217fab9f74b07139d81"),
"Name" : "Chris",
"Value" : 3,
"AnotherValue": "11111"
}
},
{
"_id" : "John",
"highest" : {
"_id" : ObjectId("59281206fab9f74b07139d7e"),
"Name" : "John",
"Value" : 12,
"AnotherValue": "654321"
}
}
As you can see, I have an array of documents, each containing an "_id" field which is the name and a "highest" field which is the actual document.
This would be represented in C# as:
var result =
from item in collection
orderby item.Value descending
group item by item.Name into itemGroup
select new { id = itemGroup.Key, highest = itemGroup.First() };
What I want to know, is it possible to add another step to my pipeline to ensure I only select the actual person document, rather than a group document which contains the person document, and can I do this WITHOUT specifying the fields? I am hoping to write a C# class that will be able to use this query for various different types of object so the fields might not be known (assume that every collection I might want to use this query for DOES have Name and Value fields, they will all have some common properties).
If I'm coming at this in the completely the wrong way then I'm open to totally new suggestions. As long as I get the desired result set at the end I'll be happy.
Thanks in advance for any help.
$replaceRoot
, otherwise you need to specify all the fields with$project
. So upgrade if that's a must. But is it really that bad? I see in either case the trade-off being the cost of running through results once more in an aggregation pipeline, or simply processing each returned result in client code. For such trivial usage, I'd just be doing it in client code. – Neil Lunn