2
votes

Duplicate documents are expected to be inserted into the mongodb collection, so an index was created with unique=True and dropDups=True.

db.myCollection.create_index("timestamp", unique=True, dropDups=True)

However if the same set of documents are inserted twice, the first insert goes fine but the second insert throws the error

db.myCollection.insert(json.loads(df.to_json()).values())

DuplicateKeyError: E11000 duplicate key error index: myDb.myCollection.$timestamp_1 dup key: { : 1385290560000000000 }

I am confused as to why dropDups=True is not working.

1
Why not do bulk inserts with continueOnError set to true?Asya Kamsky

1 Answers

2
votes

dropDups only affects an existing collection by deleting duplicate documents at the time of index creation. It does not however later stop the exception/error from being raised. When you try to insert the same document twice, an error will always be thrown if you use insert. You could consider using an upsert (reference and via findAndModify (reference)) which could be configured to conditionally apply the new document instead of raising an exception.

You might consider keeping a hash of timestamps locally if possible to avoid calling the database entirely (occasionally, you'd need to purge the hash table to prevent unbounded growth).

Or, don't enable the index until after you've inserted the data (if possible).