0
votes

Mostly we need to search on PublisherId and PlanId in our current system where the Model structure is as below:-

Publisher Model: Publisher Id Publisher Name …..

Plan Model: Plan Id Plan Name Publisher Id …..

Relationship between Publisher and Plan Model is 1:M.

Scenario: We cannot take Publisher Id or Plan Id as partition key because we have 3-5 publishers they used to submit bulk data that might cross 10 GB limit soon.

2
Could you please specify: what exactly is your question?MyStackRunnethOver
Questions: 1: What if partition key exceeds 10 GB? do we need to redistribute data? 2: Which partition we should take from above? or do we need to concatenate both publisherId and PlanId or need to generate new one? 3: Should we keep all our stuff in one collection as it is easy to run single query or should we take multiple collections?Satvinder Singh

2 Answers

0
votes

From what is given Publisher Id does sound like a good candidate as a partition key but not a sufficient one.

I would suggest combining with another value to create your partition to spread the data. One that might work well is year. That is create a id that combines the Publisher Id with the year the document in question was created, e.g. <PublisherId>.2019 (you could include month if you have very large numbers of document per publisher per year).

This allows for archiving of older content quite easily in time and could provide benefits to queries though that depends on your system.

As you note you will need to look at the spread of your data and pick a partition that will work as you scale.

0
votes

The 10 GB limit is on a Logical partition and you should not worry about it if you are choosing a partitionKey that is broad enough.

I assumed your document would look something like this and created a new synthetic partition key - publisherIdentifier.

{
  "publisherIdentifier": "1.Content.USA",
  "publisherId": "1",
  "publisherName": "A",
  "publisherType": "Content",
  "publisherCountry": "USA",
  "plans": [{"planId": "P1"},{"planId": "P2"},{"planId": "P3"}]
}

You can then query the Publishers based on their plan

SELECT VALUE publisher.publisherName
FROM publisher
JOIN plans IN publisher.plans
where plans.planId = "P1"