Is this a bad partition key scheme in Azure Cosmos DB?

Question

I am creating an Azure Cosmos database to store phone call records for multiple customers in a multi-tenant system. The users have the need to search through these phone records by phone number, name, extension number, and date range. The structure for each phone call record is pretty solid (I think) and looks like this:

{
    "id": "M7o_ddRB7VRpjUA",
    "ownerTime": "2458671015_202008",
    "ownerId": "2458671015",
    "callTime": "2020-08-30T18:47:44.424+00:00",
    "direction": "Outbound",
    "action": "VoIP Call",
    "result": "Call connected",
    "duration": 57,
    "hasR": true,
    "hasV": false,
    "xNums": [
        "2605"
    ],
    "xIds": [
        "2204328014"
    ],
    "names": [
        "sally wayfield"
    ],
    "phoneNums": [
        "2098368307",
        "2097449211"
    ],
    "emails": [
        "[email protected]"
    ],
    "xNums_s": "2605",
    "xIds_s": "2204328014",
    "phoneNums_s": "2098368307 2097449211",
    "names_s": "sally wayfield",
    "emails_s": "[email protected]",
    "_rid": "QB1nAK5kxPMBAAAAAAAAAA==",
    "_self": "dbs/QB1nAA==/colls/QB1nAK5kxPM=/docs/QB1nAK5kxPMBAAAAAAAAAA==/",
    "_etag": "\"d2013319-0000-0500-0000-5f4c63dd0000\"",
    "_attachments": "attachments/",
    "_ts": 1598841821
}

But I have struggled to come up with a good partition key strategy. My initial thought was to keep it simple and designate the account id of each tenant as the partition key. In the document above that account id is the ownerId. I did some bench-marking on some imported data from our old database, and it worked well enough. However, the volume of call records varies considerably from tenant to tenant. One particular problem I came across is that there were several tenants who are ingesting around 10,000 call records per day. And for a 6-month period, that equated to around 4GB. As we continue to ingest 10,000 call records per day for these accounts, We will hit the limit of 20GB per partition in under 3 years. So this strategy will not work.

My next idea was to include temporal data as part of the partition key. This seems to make sense, as we batch import mass quantities of time-stamped data on an ongoing, daily basis. And, 99% of all queries against this database will include a date range. I came up with a derived, abstract value labeled in the document above as ownerTime. This value combines the account id with a time factor (year and month of each phone call). Using this scheme, each tenant will get 12 partitions per year instead of a single partition for the whole account. The volume per tenant still varies, so we will still have some large partitions alongside some tiny partitions. But the differences will not be as extreme as when we had one partition per tenant.

Using the "one partition per tenant" strategy, a typical query looks like this:

partitionKey = /ownerId

select c.id,c.callTime,c.direction,c.action,c.result,c.duration,c.hasR,c.hasV,c.callers
from c
where
ownerId='2458671015'
and c.callTime>='2020-01-01T00:00:00 +000'
and c.callTime<='2020-08-31T23:59:59 +000'
and (CONTAINS(c.phoneNums_s, 'rus')
or CONTAINS(c.names_s, 'rus')
or CONTAINS(c.xNums_s, 'rus'))
order by c.callTime desc

Using the ownerTime partition strategy,the same query looks like this:

partitionKey = /ownerTime

select c.id,c.callTime,c.direction,c.action,c.result,c.duration,c.hasR,c.hasV,c.callers
from c
where array_contains([
 '2458671015_202001',
 '2458671015_202002',
 '2458671015_202003',
 '2458671015_202004',
 '2458671015_202005',
 '2458671015_202006',
 '2458671015_202007',
 '2458671015_202008'], c.ownerTime)
and c.callTime>='2020-01-01T00:00:00 +000'
and c.callTime<='2020-08-31T23:59:59 +000'
and (CONTAINS(c.phoneNums_s, 'rus')
or CONTAINS(c.names_s, 'rus')
or CONTAINS(c.xNums_s, 'rus'))
order by c.callTime desc

As you can see in the ownerTime strategy, I am explicitly calculating which partitions need to be searched (based on the search date range). And I am defining those partitions in my query.

In my initial benchmarking tests, I attempted to retrieve 25 records with these queries against one of my high volume accounts (with 1.1 million phone records). The "single partition per tenant" query used around 2200 RUs to bring back a single record. I had to keep using the continuation token to get more results, with each request using around 2000 RUs until I finally got to 25 records. I lost count of the RUs after many iterations. The ownerTime partition query used around 2500 RUs to retrieve 25 records without having to use the continuation token.

It seems that the ownerTime strategy is a better choice, as long as all queries include a date range. But I wanted to get some feedback on this strategy. It's not perfect, but it seems to work good. I come from a SQL background, and this is my first time implementing a document database. I am doing it to take advantage of the improved text searching and scalability. Are there "gotcha's" I might encounter in the future using the ownerTime partition strategy? Based on the information I've provided, is there a partition strategy that might work better?

Mark Brown Mark Brown · Accepted Answer · 2020-08-31T19:31:34

Bounded cross partition queries are definitely way better than unbounded queries so based upon what you've included in your question here I would agree that including the time element to create a synthetic partition key is a good partitioning strategy.

What's not explained here is are there other queries run at high volume that do not include time (or ownerId) in the filter predicates? If so then you will need to use Change Feed to copy and keep in sync the data in the data first container to a second container with a partition key that makes sense for those queries as well.

Another option too, especially if you're looking to do analytics over this data, is to enable analytical storage and use Synapse Link then write Spark (and soon SQL Serverless) queries on the data.

Is this a bad partition key scheme in Azure Cosmos DB?

1 Answers