12
votes

I want to insert_many() documents in my collection. Some of them may have the same key/value pair (screen_name in my example) than existing documents inside the collection. I have a unique index set on this key, therefore I get an error.

my_collection.create_index("screen_name", unique = True)

my_collection.insert_one({"screen_name":"user1", "foobar":"lalala"})
# no problem

to_insert = [
    {"screen_name":"user1", "foobar":"foo"}, 
    {"screen_name":"user2", "foobar":"bar"}
]
my_collection.insert_many(to_insert)

# error : 
# File "C:\Program Files\Python\Anaconda3\lib\site-packages\pymongo\bulk.py", line 331, in execute_command 
# raise BulkWriteError(full_result)
# 
# BulkWriteError: batch op errors occurred

I'd like to :

  1. Not get an error
  2. Not change the already existing documents (here {"screen_name":"user1", "foobar":"lalala"})
  3. Insert all the non-already existing documents (here, {"screen_name":"user2", "foobar":"bar"})

Edit : As someone said in comment "this question is asking how to do a bulk insert and ignore unique-index errors, while still inserting the successful records. Thus it's not a duplicate with the question how do I do bulk insert". Please reopen it.

2

2 Answers

20
votes

One solution could be to use the ordered parameter of insert_many and set it to False (default is True):

my_collection.insert_many(to_insert, ordered=False)

From the PyMongo documentation:

ordered (optional): If True (the default) documents will be inserted on the server serially, in the order provided. If an error occurs all remaining inserts are aborted. If False, documents will be inserted on the server in arbitrary order, possibly in parallel, and all document inserts will be attempted.

Although, you would still have to handle an exception when all the documents couldn't be inserted.

Depending on your use-case, you could decide to either pass, log a warning, or inspect the exception.

1
votes

ordered=False still works. The Pymongo documentation still says "documents will be inserted on the server in arbitrary order, possibly in parallel, and all document inserts will be attempted."