0
votes

I am adding simple search functionality for my users in my app, and I am strongly considering using Azure CosmosDB. Documents in my Cosmos database (Azure) represent phone calls, and look like this:

{
    "id": "JKEeW3aebSEAzUA",
    "partitionKey": "191625028",
    "ownerId": "191625028",
    "callTime": "2020-06-12T22:13:18.271+00:00",
    "direction": "Inbound",
    "action": "Phone Call",
    "result": "Accepted",
    "callers": [
        {
            "phoneNum": "9182914018",
            "name": "JENKS        OK",
            "location": "Jenks, OK"
        },
        {
            "phoneNum": "9189406524",
            "name": "Main IVR",
            "location": null
        },
        {
            "phoneNum": null,
            "name": "Main IVR",
            "location": null,
        }
    ]
}

I am going to provide search-ability based on the nested properties phoneNum, name, and location within each callers item. I am considering using this query:

SELECT c.id,a.phoneNum,c.callers 
FROM c join a in c.callers 
where CONTAINS(a.phoneNum, '4018')

Is this the most efficient way to perform this kind of search? I am open to restructuring my documents to make searching within those fields faster. Some things to note:

  1. It's a multi-tenant system, and we are using a "partition-per-tenant" scheme in this particular database.
  2. Some partitions/tenants will have 1,000,000+ call records and 3,000,000 - 4,000,000 nested caller records when the data import is complete.

I am new to Azure CosmosDB. Currently we provide limited search functionality similar to this via SQL Server. This structure is identical to our structure in SQL Server (parent call records, child caller records).

2
is there a reason why you don't want to use azure search service?Nonik
I’m voting to close this question because your question kinda sounds like code improvement and/or review and if so may be off-topic for SO. It may be better suited for another SE site but be sure to read the relevant FAQ; and/or re-wording your question as necessary before cross-posting. How to Ask. Good luck!MickyD
@Nonik, I have considered Azure search. It seems overkill though. I don't need full linguistic search capabilities. I just need plain text search. Also, the structure/pricing model is very confusing. I have no idea how many partitions/replicas I need. And it seemed that for a very basic setup, I might need something that costs me around $2,000. My current SQL Server bill is around $300, and I would like to keep the cost around that same range.Matt Spinks
This really requires benchmarking to determine the best modeling for you. That said: it appears you've created an unbounded array scenario for yourself (logging phone calls in an array within a document).David Makogon
Thanks David. I do plan on benchmarking. I am trying to find a good place to start. What do you mean by "an unbounded array scenario"? Are you talking about the calls, or the callers?Matt Spinks

2 Answers

1
votes

I think an optimized solution would be to have multiple container per tenant. This will solve data segregation per tenant as well.

On top of that your partition key can be based parts of one or combinations of your data fields in such way the the number of data set across your container almost equally distributed across all your partitions.

0
votes

It's a multi-tenant system, and we are using a "partition-per-tenant" ... Is this the most efficient way to perform this kind of search?

If you don't filter to a single partition key, you will run the query on every physical partition.

A Cosmos DB physical partition is more like a separate SQL Server than it is like a SQL Server table partition.

In a multi-tenant system most queries should be scoped to a single tenant. See

https://docs.microsoft.com/en-us/azure/cosmos-db/how-to-query-container#in-partition-query

Other than that, make sure you don't exclude that property path from your indexing policy and it should be reasonable. You can always check the Request Units consumed by a query in the portal or in the result in code. In Cosmos DB you need to keep an eye on that constantly, as it translates directly into money.

You can share provisioned Request Units among all the containers in a database, and if you use a container-per-tenant model, you can select some other useful partition key for optimized access within a tenant.