3
votes

Consider a collection with the following documents:

{ 
  name: "John Doe",
  emails: [
    { 
      value: "[email protected]", 
      isValid: true, 
      isPreferred: true 
    }
  ]
},
{ 
  name: "John Doe",
  emails: [ 
    { 
       value: "[email protected]",
       isValid: false,
       isPreferred: false
    },
    { 
       value: "[email protected]",
       isValid: true,
       isPreferred: true
    } 
  ]
}

There should be no users with the same valid and preferred emails, so there is a unique index for that:

db.users.createIndex( { "emails.value": 1 }, { name: "loginEmail", unique: true, partialFilterExpression: { "emails.isValid": true, "emails.isPreferred": true } } )

Adding the following email to the first document triggers the unique constraint violation:

{
  name: "John Doe",
  emails: [
   { 
     value: "[email protected]",
     isValid: false,
     isPreferred: false
   }
  ]
}

Caused by: com.mongodb.MongoCommandException: Command failed with error 11000 (DuplicateKey): 'E11000 duplicate key error collection: profiles.users index: loginEmail dup key: { emails.value: "[email protected]", emails.isValid: false, emails.isPreferred: false }' on server profiles-db-mongodb.dev:27017. The full response is {"ok": 0.0, "errmsg": "E11000 duplicate key error collection: profiles.users index: loginEmail dup key: { emails.value: "[email protected]", emails.isValid: false, emails.isPreferred: false }", "code": 11000, "codeName": "DuplicateKey", "keyPattern": {"emails.value": 1, "emails.isValid": 1, "emails.isPreferred": 1}, "keyValue": {"emails.value": "[email protected]", "emails.isValid": false, "emails.isPreferred": false}}

As I can understand, this happens because the filter expression is applied to the collection, not to the embedded documents, so although being somewhat counterintuitive and unexpected, the index behaves as described.

My question is how can I ensure partial uniqueness without having false negatives?

1

1 Answers

2
votes

TLDR: You cant.

Let's understand why it's happening first, maybe then we'll understand what can be done. The problem originates due to a combination of two Mongo features.

  1. the dot notation syntax. The dot notation syntax allows you to query subdocuments in arrays at ease ("emails.isPreferred": true). However when you want to start using multiple conditions for subdocuments like in your case you need to use something like $elemMatch sadly the restrictions for partialFilterExpression are quite restrictive and do not give you such power. Which means even docs with emails such as:
{
    "_id": ObjectId("5f106c0e823eea49427eea64"),
    "name": "John Doe",
    "emails": [
        {
            "value": "[email protected]",
            "isValid": true,
            "isPreferred": false
        },
        {
            "value": "[email protected]",
            "isValid": false,
            "isPreferred": true
        }
    ]
}

Will be indexed. So ok, We will have some extra indexed documents in the collection but still apart from (falsely) increasing index size you still hope it might work, but it doesn't due to point 2.

  1. multikey indexes:

MongoDB uses multikey indexes to index the content stored in arrays. ... , MongoDB creates separate index entries for every element of the array.

So when you create an index on an array or on any field of a sub document in an array Mongo will "flatten" the array and create a unique entry for each of the documents. and in this case it will create a unique index for all emails in the array.

So due to all these "features" and the restrictions of the partial filter syntax usage we can't really achieve what you want.

So what can you do? I'm sure you're already thinking of possible work arounds through this. A simple solution would be to maintain an extra field that will only contain those isValid and isPreferred emails. then a unique sparse index will do the trick.