1
votes

For a project I'm considering the use of Cosmos DB (SQL API) as my database solution. Reading the docs about the Request Units, I learned here and here that reading 1 item of size 1 KB takes up 1 RU (Request Unit).

When I execute the query below (where I query all items within a single partition (gender is the PartitionKey), I get a result of 5.000 items (in five chunks of 1.000 items). Each item is 1.5 KB in size, so should be even more that 1 RU per item. However, the header states that the RequestCharge is only 88.12 in total, for 1.000 items. Following the rule of 1 RU per item of size 1 KB I was expecting at least 1000 RU.

Does anybody know how to interpret the RequestCharge correctly?

Code and query:

    public async Task<List<Profile>> GetAllProfilesByGender(string gender)
    {
        var container = GetContainer();
        var queryIterator = GetQueryIterator(container, gender);

        var profiles = new List<Profile>();
        while (queryIterator.HasMoreResults)
        {
            var resultSet = await queryIterator.ReadNextAsync();
            foreach (var profile in resultSet)
            {
                profiles.Add(profile);
            }
        }

        return profiles;
    }

    private FeedIterator<Profile> GetQueryIterator(Container container, string gender)
    {
        var query = new QueryDefinition($"SELECT * FROM c WHERE c.Gender = '{gender}'");
        return container.GetItemQueryIterator<Profile>(query);
    }
1
This is unrelated to your question but gender is not a partition key that scales very well if you are planning on growing beyond 20GB in size and/or have a write heavy scenario. You can learn more at Partitioning in Cosmos DBMark Brown
@MarkBrown Thanks for the comment. The db will never reach that size (max to 1-2 GB) and write operations will only be on single items.JJuice
ok. Less important then for single partition containers. Thanks.Mark Brown
@MarkBrown Your comment made me rethink it. Maybe you can give some advice on this one? Given that queries on this container will always at least specify Gender (users query for either men or for women, but never both at the same time) and a subset of some of the other properties of the Profile documents (the end-user can choose filters on FirstName, City, Education, Profession, next to Gender) I picked Gender as the PartitionKey because it is the only suitable PartitionKey that prevents me from doing cross-partition queries. What's your take on this one?JJuice
Sure, if it's always used in queries then that's part of what makes a good partition key because queries without one will fan out across all partitions. Again, not a big deal in small collections. It's just a design consideration when designing NoSQL databases for large scale.Mark Brown

1 Answers

3
votes

Request Unit (RU) charge doesn't scale linearly with the number of documents retrieved. There are many factors involved: complexity of query, usage of indexes, etc.

The notion of "One read of one 1K document costs 1 RU" is something entirely different. Many times, a single document is needed (which is why there's a read API call, vs a query call). If you compare read vs query for that same single document, you'll find that the query version of document retrieval costs more, RU-wise, than the read version (as it has to invoke the query engine, deal with indexes, etc).

As a side note: I'm not sure you'd ever want to see RU usage scale (in your example, 1,000 RU for 1,000 returned documents) - that would end up being enormously expensive. The query engine has gone through many optimizations over the years, to push RU cost down.