How do I use aggregation operators in a $match in MongoDB (for example $year or $dayOfMonth)?

Question

I have a collection full of documents with a created_date attribute. I'd like to send these documents through an aggregation pipeline to do some work on them. Ideally I would like to filter them using a $match before I do any other work on them so that I can take advantage of indexes however I can't figure out how to use the new $year/$month/$dayOfMonth operators in my $match expression.

There are a few examples floating around of how to use the operators in a $project operation but I'm concerned that by placing a $project as the first step in my pipeline then I've lost access to my indexes (MongoDB documentation indicates that the first expression must be a $match to take advantage of indexes).

Sample data:

{
    post_body: 'This is the body of test post 1',
    created_date: ISODate('2012-09-29T05:23:41Z')
    comments: 48
}
{
    post_body: 'This is the body of test post 2',
    created_date: ISODate('2012-09-24T12:34:13Z')
    comments: 10
}
{
    post_body: 'This is the body of test post 3',
    created_date: ISODate('2012-08-16T12:34:13Z')
    comments: 10
}

I'd like to run this through an aggregation pipeline to get the total comments on all posts made in September

{
    aggregate: 'posts',
    pipeline: [
         {$match:
             /*Can I use the $year/$month operators here to match Sept 2012?
             $year:created_date : 2012,
             $month:created_date : 9
             */
             /*or does this have to be 
             created_date : 
                  {$gte:{$date:'2012-09-01T04:00:00Z'}, 
                  $lt: {$date:'2012-10-01T04:00:00Z'} }
             */
         },
         {$group:
             {_id: '0',
              totalComments:{$sum:'$comments'}
             }
          }
    ]
 }

This works but the match loses access to any indexes for more complicated queries:

{
    aggregate: 'posts',
    pipeline: [
         {$project:
              {
                   month : {$month:'$created_date'},
                   year : {$year:'$created_date'}
              }
         },
         {$match:
              {
                   month:9,
                   year: 2012
               }
         },
         {$group:
             {_id: '0',
              totalComments:{$sum:'$comments'}
             }
          }
    ]
 }

what type of value do you want to match? can you give an example with actual date values? — Asya Kamsky
Have added some sample data and an example of what I'm trying to do — Mason
Are those strings or dates? You won't be able to perform date queries on them if they're strings. — cirrus
Do you need to group by month and year for some reason? If so, I would $project AFTER the $match, but using the $match as per my original code sample. You can always $match again on month and year. Otherwise, you're not indexing month and year, just date. One alternative, is to store the date in two formats. Why not store a duplicate month and year field that you can index directly if that's the way you need to query? I'm still not clear on why you want to filter on month and date fields at all though. The date query you already have works pretty well to find posts in September. — cirrus

Asya Kamsky Asya Kamsky · Accepted Answer · 2012-10-02T18:04:07

As you already found, you cannot $match on fields that are not in the document (it works exactly the same way that find works) and if you use $project first then you will lose the ability to use indexes.

What you can do instead is combine your efforts as follows:

{
    aggregate: 'posts',
    pipeline: [
         {$match: {
             created_date : 
                  {$gte:{$date:'2012-09-01T04:00:00Z'}, 
                  $lt:  {date:'2012-10-01T04:00:00Z'} 
                  }}
             }
         },
         {$group:
             {_id: '0',
              totalComments:{$sum:'$comments'}
             }
          }
    ]
 }

The above only gives you aggregation for September, if you wanted to aggregate for multiple months, you can for example:

{
    aggregate: 'posts',
    pipeline: [
         {$match: {
             created_date : 
                  { $gte:'2012-07-01T04:00:00Z', 
                    $lt: '2012-10-01T04:00:00Z'
                  }
         },
         {$project: {
              comments: 1,
              new_created: {
                        "yr" : {"$year" : "$created_date"},
                        "mo" : {"$month" : "$created_date"}
                     }
              }
         },
         {$group:
             {_id: "$new_created",
              totalComments:{$sum:'$comments'}
             }
          }
    ]
 }

and you'll get back something like:

{
    "result" : [
        {
            "_id" : {
                "yr" : 2012,
                "mo" : 7
            },
            "totalComments" : 5
        },
        {
            "_id" : {
                "yr" : 2012,
                "mo" : 8
            },
            "totalComments" : 19
        },
        {
            "_id" : {
                "yr" : 2012,
                "mo" : 9
            },
            "totalComments" : 21
        }
    ],
    "ok" : 1
}

How do I use aggregation operators in a $match in MongoDB (for example $year or $dayOfMonth)?

3 Answers