0
votes

I have a mongodb index with close to 100k documents. On each document, there are the following 3 fields.

arrayX: [ObjectId] someID: ObjectId timestamp: Date

I have created a compound index for the 3 fields in that order.

When I try to then fire an aggregate query (written below in pseudocode), as

match(
  and(
    arrayX: (elematch: A),
    someId: Y
  )
)
sort (timestamp: 1)

it does not end up using the compound index.

The way I know this is when I use .explain(), the winningPlan stage is FETCH, the inputStage is IXSCAN and the indexname is timestamp_1 which means its only using the other single key index i created for the timestamp field.

What's interesting is that if I remove the sort stage, and keep everything the exact same, mongodb ends up using the compound index.

What gives?

2

2 Answers

0
votes

Multi-key indexes are not useful for sorting. I would expect that a plan using the other index was listed in rejectedPlans.

If you run explain with the allPlansExecution option, the response will also show you the execution times for each plan, among other things.

Since the multi-key index can't be used for sorting the results, that plan would require a blocking sort stage. This means that all of the matching documents must be retrieved and then sorted in memory before sending a response.

On the other hand, using the timestamp_1 index means that documents will be encountered in a presorted order while traversing the index. The tradeoff here is that there is no blocking sort stage, but every document must be examined to see if it matches the query.

For data sets that are not huge, or when the query will match a significant fraction of the collection, the plan without a blocking sort will return results faster.

You might test creating another index on { someID:1, timestamp:1 } as this might reduce the number of documents scanned while still avoiding the blocking sort.

The reason the compound index is selected when you remove the sort stage is because that stage probably accounts for the majority of the execution time.

The fields in the executionStats section of the explain output are explained in Explain Results. Comparing the estimated execution times for each stage may help you determine where you can tune the queries.

0
votes

I am using documents like this (based on the question post) for discussion:

{
  _id: 1,
  fld: "One",
  arrayX: [ ObjectId("5e44f9ed221e963909537848"), ObjectId("5e44f9ed221e963909537849") ],
  someID: ObjectId("5e44f9e7221e963909537845"),
  timestamp: ISODate("2020-02-12T01:00:00.0Z")
}


The Indexes:

I created two indexes, as mentioned in the question post:

{ timestamp: 1 } and { arrayX:1, someID:1, timestamp:1 }


The Query:

db.test.find( 
  { 
      someID: ObjectId("5e44f9e7221e963909537845"), 
      arrayX: ObjectId("5e44f9ed221e963909537848")
  } 
).sort( { timestamp: 1 } )

In the above query I am not using $elemMatch. A query filter using $elemMatch with single field equality condition can be written without the $elemMatch. From $elemMatch Single Query Condition:

If you specify a single query predicate in the $elemMatch expression, $elemMatch is not necessary.


The Query Plan:

I ran the query with explain, and found that the query uses the arrayX_1_someID_1_timestamp_1index. The index is used for the filter as well as the sort operations of the query.

Sample plan details:

"winningPlan" : {
        "stage" : "FETCH",
        "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                        "arrayX" : 1,
                        "someID" : 1,
                        "timestamp" : 1
                },
                "indexName" : "arrayX_1_someID_1_timestamp_1",
...

The IXSCAN specifies that the query uses the index. The FETCH stage specifies that the document is retrieved for getting other details using the index id. This means that both the query's filter as well as the sort use the index. The way to know that sort uses an index is the plan will not have a SORT stage - as in this case.


Reference:

From Sort and Non-prefix Subset of an Index:

An index can support sort operations on a non-prefix subset of the index key pattern. To do so, the query must include equality conditions on all the prefix keys that precede the sort keys.