Assume a MongoDB collection containing documents which must be updated with new fields or subobjects regularly; alternatively, if the document does not exist yet, the regular document update process shall insert a new document (a typical upsert).
What is the fastest way of achieving this? At the moment I have a three stage process which is very slow:
Stage 1: find the documents which must be updated based on a list containing their customIDs (there exists an index on the customID field).
db[myCollection].find({'customID': {'$in': myUpdateList}})
Stage 2: iterate over the documents in the cursor retrieved in Stage 1, enriching them with new fields and/or subobjects. Add the new documents which can not yet be updated since they are not yet in the database to the same document list.
Stage 3: upsert to MongoDB using an Unordered Bulk Operation.
bulk_mapping = db[myCollection].initialize_unordered_bulk_op()
for key, value in enrichedDocs.items():
bulk_mapping.find({'customID': key}).upsert().update({'$set': {'customID': key, 'enrichedBody': value['enrichedBody']}})
bulk_mapping.execute()