2
votes

I am trying to use cosmos db change feed (I'm referring to https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed-processor and https://github.com/Azure/azure-cosmos-dotnet-v2/tree/master/samples/code-samples/ChangeFeedProcessorV2).

When I start a multiple instances of a consumer, the observer seems to see only 1 partition key range. I only see a message - Observer opened for partition Key Range 0 and it starts receiving the change feed. So, the feed is received by only 1 consumer at any given point. If I close one consumer, the next one picks up happily.

I can't seem to understand the partition keys / ranges in cosmos db. In cosmos db, I've created a database and a collection within it. I've defined a partition key - /myId. I store a unique guid in myId. I've saved about 10000 transactions in the collection.

When I look at partition key ranges using api (/dbs/db-name/colls/coll-name/pkranges), I see only node under PartitionKeyRanges. Below is the output I see

{
    "_rid": "LEAgAL7tmKM=",
    "PartitionKeyRanges": [
        {
            "_rid": "LEAgAL7tmKMCAAAAAAAAUA==",
            "id": "0",
            "_etag": "\"00007d00-0000-0000-0000-5c3645e70000\"",
            "minInclusive": "",
            "maxExclusive": "FF",
            "ridPrefix": 0,
            "_self": "dbs/LAEgAA==/colls/LEAgAL7tmKM=/pkranges/LEAgAL7tmKMCAAAAAAAAUA==/",
            "throughputFraction": 1,
            "status": "online",
            "parents": [],
            "_ts": 1547060711
        }
    ],
    "_count": 1
}

Shouldn't this show more partition key ranges? Is this behavior expected?

How do I get multiple consumers to receive data as shown under https://docs.microsoft.com/en-us/azure/cosmos-db/change-feed-processor?

1

1 Answers

1
votes

TL;DR - you should be able to ignore partition key ranges and the number of them you have and just let Change Feed Processor manage that for you.

The partition key ranges is an implementation detail we currently leak. The short answer is we add new partition key ranges when we want to restructure how your data is stored in the backend. This can happen for lots of reasons, like you add more data, you consume a lot of RUs for a subsection of that data, or we just want to shuffle things around. Theoretically, if you kept adding data, we'd eventually split the range in two.

We're working on some updates for the v3 SDKs that are currently in preview to abstract this a bit further, since even the answer I have given above is pretty hand wavey and we should have a more easily understood contract for public APIs.