6
votes

I have an article model like this:

var ArticleSchema = new Schema({

    type: String
    ,title: String
    ,content: String
    ,hashtags: [String]

    ,comments: [{
        type: Schema.ObjectId
        ,ref: 'Comment'
    }]

    ,replies: [{
        type: Schema.ObjectId
        ,ref: 'Reply'
    }]

    , status: String
    ,statusMeta: {
        createdBy: {
            type: Schema.ObjectId
            ,ref: 'User'
        }
        ,createdDate: Date
        , updatedBy: {
            type: Schema.ObjectId
            ,ref: 'User'
        }
        ,updatedDate: Date

        ,deletedBy: {
            type: Schema.ObjectId,
            ref: 'User'
        }
        ,deletedDate: Date

        ,undeletedBy: {
            type: Schema.ObjectId,
            ref: 'User'
        }
        ,undeletedDate: Date

        ,bannedBy: {
            type: Schema.ObjectId,
            ref: 'User'
        }
        ,bannedDate: Date
        ,unbannedBy: {
            type: Schema.ObjectId,
            ref: 'User'
        }

        ,unbannedDate: Date
    }
}, {minimize: false})

When user creates or modify the article, I will create hashtags

ArticleSchema.pre('save', true, function(next, done) {
    var self = this
    if (self.isModified('content')) {
        self.hashtags = helper.listHashtagsInText(self.content)
    }
    done()
    return next()
})

For example, if user write "Hi, #greeting, i love #friday", I will store ['greeting', 'friday'] in hashtags list.

I am think about creating an index for hashtags to make queries on hashtags faster. But from mongoose manual, I found this:

When your application starts up, Mongoose automatically calls ensureIndex for each defined index in your schema. Mongoose will call ensureIndex for each index sequentially, and emit an 'index' event on the model when all the ensureIndex calls succeeded or when there was an error. While nice for development, it is recommended this behavior be disabled in production since index creation can cause a significant performance impact. Disable the behavior by setting the autoIndex option of your schema to false.

http://mongoosejs.com/docs/guide.html

So is indexing faster or slower for mongoDB/Mongoose?

Also, even if I create index like

  hashtags: { type: [String], index: true }

How can I make use of the index in my query? Or will it just magically become faster for normal queries like:

   Article.find({hashtags: 'friday'})
2
Did you read the core documentation for .createIndex()? Specifically: "If you call multiple createIndex() methods with the same index specification at the same time, only the first operation will succeed, all other operations will have no effect.". Also indexes cost to write, but they speed up reads. This is a basic concept of an index. There is plenty of documentation out there to explain what indexes do. Perhaps do some reading.Blakes Seven
@BlakesSeven I am using Mongoose, which is sorta mongoDB wrapper i think. The official doc confuses me suggesting to turn it off in productionOMGPOP

2 Answers

5
votes

You are reading it wrong

You are misreading the intent of the quoted block there as to what .ensureIndex() ( now deprecated, but still called by mongoose code ) actually does here in the context.

In mongoose, you define an index either at the schema or model level as is appropriate to your design. What mongoose "automatically" does for you is on connection it inpects each registered model and then calls the appropriate .ensureIndex() methods for the index definitions provided.

What does this actually do?

Well, in most cases, being after you have already started up your application before and the .ensureIndexes() method was run is Absolutely Nothing. That is a bit of an overstatement, but it more or less rings true.

Because the index definition has already been created on the server collection, a subsesquent call does not do anything. I.e, it does not drop the index and "re-create". So the real cost is basically nothing, once the index itself has been created.

Creating indexes

So since mongoose is just a layer on top of the standard API, the createIndex() method contains all the details of what is happening.

There are some details to consider here, such as that an index build can happen in the "background", and while this is less intrusive to your application it does come at it's own cost. Notably that the index size from "background" generation will be larger than if you built it n the foreground, blocking other operations.

Also all indexes come at a cost, notably in terms of disk usage as well as an additional cost of writing the additional information outside of the collection data itself.

The adavantages of an index are that it is much faster to "search" for values contained within an index than to seek through the whole collection and match the possible conditions.

These are the basic "trade-offs" associated with indexes.

Deployment Pattern

Back to the quoted block from the documentation, there is a real intent behind this advice.

It is typical in deployment patterns and particularly with data migrations to do things in this order:

  1. Populate data to relevant collections/tables
  2. Enable indexes on the collection/table data relevant to your needs

This is because there is a cost involved with index creation, and as mentioned earlier it is desirable to get the most optimum size from the index build, as well as avoid having each document insertion also having the overhead of writing an index entry when you are doing this "load" in bulk.

So that is what indexes are for, those are the costs and benefits and the message in the mongoose documentation is explained.

In general though, I suggest reading up on Database Indexes for what they are and what they do. Think of walking into a library to find a book. There is a card index there at the entrance. Do you walk around the library to find the book you want? Or do you look it up in the card index to find where it is? That index took someone time to create and also keep it updated, but it saves "you" the time of walking around the whole library just so you can find your book.

2
votes