5
votes

How can I aggregate my MongoDB results by ObjectId date. Example:

Default cursor results:

cursor = [
    {'_id': ObjectId('5220b974a61ad0000746c0d0'),'content': 'Foo'},
    {'_id': ObjectId('521f541d4ce02a000752763a'),'content': 'Bar'},
    {'_id': ObjectId('521ef350d24a9b00077090a5'),'content': 'Baz'},
]

Projected results:

projected_cursor = [
    {'2013-09-08':
        {'_id': ObjectId('5220b974a61ad0000746c0d0'),'content': 'Foo'},
        {'_id': ObjectId('521f541d4ce02a000752763a'),'content': 'Bar'}
    },
    {'2013-09-07':
        {'_id': ObjectId('521ef350d24a9b00077090a5'),'content': 'Baz'}
    }
]

This is what I'm currently using in PyMongo to achieve these results, but it's messy and I'd like to see how I can do it using MongoDB's aggregation framework (or even MapReduce):

cursor = db.find({}, limit=10).sort("_id", pymongo.DESCENDING)
messages = [x for x in cursor]
this_date = lambda x: x['_id'].generation_time.date()
dates = set([this_date(message) for message in messages])
dates_dict = {date: [m for m in messages if this_date(m) == date] for date in dates}

And yes, I know that the easiest way would be to simply add a new date field to each record then aggregate by that, but that's not what I want to do right now.

Thanks!

3

3 Answers

7
votes

There is no way to accomplish what you're asking with mongodb's aggregation framework, because there is no aggregation operator that can turn ObjectId's into something date-like (there is a JIRA ticket, though). You should be able to accomplish what you want using map-reduce, however:

// map function
function domap() {
    // turn ObjectId --> ISODate
    var date = this._id.getTimestamp();
    // format the date however you want
    var year = date.getFullYear();
    var month = date.getMonth();
    var day = date.getDate();

    // yields date string as key, entire document as value
    emit(year+"-"+month+"-"+day, this);
}

// reduce function
function doreduce(datestring, docs) {
    return {"date":datestring, "docs":docs};
}
2
votes

So this doesn't answer my question directly, but I did find a better way to replace all that lambda nonsense above using Python's setdefault:

d = {}
for message in messages:
    key = message['_id'].generation_time.date()
    d.setdefault(key,[]).append(message)

Thanks to @raymondh for the hint in is PyCon talk:

Transforming Code into Beautiful, Idiomatic Python

2
votes

The Jira Ticket pointed out by llovett has been solved, so now you can use date operators like $isoWeek and $year to extract this information from an ObjectId.

Your aggregation would look something like this:

{
    "$project":
        {

            "_id": {
                "$dateFromParts" : {
                    "year": { "$year": "$_id"},
                    "month": { "$month": "$_id"},
                    "day": { "$dayOfMonth": "$_id"}
                }
            }
        }
}