1
votes

I have a collection called englishWords, and the unique index is the "word" field. When I do this

from pymongo import MongoClient

tasovshik = MongoClient()
db = tasovshik.tongler
coll = db.englishWords

f = open('book.txt')
for word in f.read().split():
    coll.insert( { "word": word } } )

I get this error message

pymongo.errors.DuplicateKeyError: E11000 duplicate key error index: tongler.englishWords.$word_1 dup key: { : "Harry" }
, but it stops to insert when the first existing word is to be inserted.

I do not want to implement the check of existence, I want to use the benefits of unique index with no problems.

3

3 Answers

3
votes

You could do the following:

for word in f.read().split():
    try:
        coll.insert( { "word": word } } )
    except pymongo.errors.DuplicateKeyError:
        continue

This will ignore errors.

And also, did you drop the collection before trying?

2
votes

To avoid unnecessary exception handling, you could do an upsert:

from pymongo import MongoClient

tasovshik = MongoClient()
db = tasovshik.tongler
coll = db.englishWords

for word in f.read().split():
    coll.replace_one({'word': word}, {'word': word}, True)

The last argument specifies that MongoDB should insert the value if it does not already exist.

Here's the documentation.


EDIT: For even faster performances for a long list of words, you could do it in bulk like this:

from pymongo import MongoClient

tasovshik = MongoClient()
db = tasovshik.tongler
coll = db.englishWords

bulkop = coll.initialize_unordered_bulk_op()
for word in f.read().split():
    bulkop.find({'word':word}).upsert()

bulkop.execute()

Taken from bulk operations documentation

0
votes

I've just run your code and everything looks good except that you have an extra } at the last line. Delete that, and you don't have the drop any collection. Every insert, creates it's own batch of data, so there is no need for dropping the previous collection.

Well, error msg indicates that the key Harry is already inserted and you are trying to insert again with the same key. Looks like this in not your entire code?