2
votes

Lets say I have these documents in my CosmosDB. (DocumentDB API, .NET SDK)

{
    // partition key of the collection
    "userId" : "0000-0000-0000-0000",
    "emailAddresses": [
        "[email protected]", "[email protected]"
    ]
    // some more fields
}

I now need to find out if I have a document for a given email address. However, I need the query to be case insensitive.

There are ways to search case insensitive on a field (they do a full scan however):

How to do a Case Insensitive search on Azure DocumentDb?

select * from json j where LOWER(j.name) = 'timbaktu'
e => e.Id.ToLower() == key.ToLower()

These do not work for arrays. Is there an alternative way? A user defined function looks like it could help.

I am mainly looking for a temporary low-effort solution to support the scenario (I have multiple collections like this). I probably need to switch to a data structure like this at some point:

{
    "userId" : "0000-0000-0000-0000",
    // Option A
    "emailAddresses": [
        {
            "displayName": "[email protected]",
            "normalizedName" : "[email protected]"
        },
        {
            "displayName": "[email protected]",
            "normalizedName" : "[email protected]"
        }
    ],
    // Option B
    "emailAddressesNormalized": {
        "[email protected]", "[email protected]"
    }
}

Unfortunately, my production database already contains documents that would need to be updated to support the new structure. My production collections contain only 100s of these items, so I am even tempted to just get all items and do the comparison in memory on the client.

1
You have already given yourself 3 usable solutions: If you can change the schema, update those 100 documents. If not, scan those 100s of docs. What else do you expect in answers?Imre Pühvel
I can't scan for array properties(server-side). Thats what the question is about.Alex AIT

1 Answers

2
votes

If performance matters then you should consider one of the normalization solution you have proposed yourself in question. Then you could index the normalized field and get results without doing a full scan.

If for some reason you really don't want to retouch the documents then perhaps the feature you are missing is simple join?

Example query which will do case-insensitive search from within array with a scan:

SELECT c FROM c
join email in c.emailAddresses
where lower(email) = lower('[email protected]')

You can find more examples about joining from Getting started with SQL commands in Cosmos DB.

Note that where-criteria in given example cannot use an index, so consider using it only along another more selective (indexed) criteria.