Bulk insert performance in MongoDB for large collections

Question

I'm using the BulkWriteOperation (java driver) to store data in large chunks. At first it seems to be working fine, but when the collection grows in size, the inserts can take quite a lot of time.

Currently for a collection of 20M documents, bulk insert of 1000 documents could take about 10 seconds.

Is there a way to make inserts independent of collection size? I don't have any updates or upserts, it's always new data I'm inserting.

Judging from the log, there doesn't seem to be any issue with locks. Each document has a time field which is indexed, but it's linearly growing so I don't see any need for mongo to take the time to reorganize the indexes.

I'd love to hear some ideas for improving the performance

Thanks

Have you given a thought about sharding? Performance depends on lot of parameters like document size, initial data, hw, cluster setup etc. Also check if mongoimport can be used. while doing insert mongo validates the json object, if your document is large then that validation will also take time and can hamper performance in such cases disabling that validation can also help but boost will be minor if documents are small in size. — Nachiket Kate
What was the performance when the collection was 2M docs in size? And what indexes are set up on the collection, what is the average new doc size, what is the physical media, and what is the RAM of the primary? My gut says a smaller-scale infrastructure now has to deal with a bigger workload.... — Buzz Moschetti
Could you try dropping the index, and seeing if it makes any difference to the performance. I suspect it won't, given what you've already said about it, but it would be a useful way of ruling that out as the culprit. — Vince Bowdren
Does your document size change a lot? Perhaps showing us a (few) sample document(s) would help to determine if that can be a problem. — p.streef

glytching glytching · Accepted Answer · 2017-06-23T12:24:20

You believe that the indexing does not require any document reorganisation and the way you described the index suggests that a right handed index is ok. So, indexing seems to be ruled out as an issue. You could of course - as suggested above - definitively rule this out by dropping the index and re running your bulk writes.

Aside from indexing, I would …

Consider whether your disk can keep up with the volume of data you are persisting. More details on this in the Mongo docs
Use profiling to understand what’s happening with your writes

Bulk insert performance in MongoDB for large collections

3 Answers