After having read all the documentation I can find on document db, I'm still struggling on how best to design the partition key. Let's take a scenario of corporate emails sent to employees/departments. Let's say it is a massive company, 1 million employees, this is fictitious I just want to assume huge that sends a couple million emails a week, most get read and are clicked so ingesting large amounts of data.
Let me represent some of the entities as json. For all intents and purposes this data at least for my case is in a sql server but I'd like to track the opens, clicks by member and by department. With a large organization this data can grow quickly is the use case for DocumentDB. I don't want to debate the merits of DocumentDB for this, just looking to better understand the Partition Key design. Boiling it down:
Data
- Newsletter:
{newsletterId: 1, name: 'something', departments:[1,2,3]} // this newsletter sent to 3 company departments - Employee:
{employeeId: 1212, name: 'John Smith'} - NewsletterEmployeeActivity:
{newsletterId: 1, employeeId:1212, link: 2, date: '1-2-2017'}and/or{newsletterId: 1, employeeId: 1212, open: '1-2-2017'}// where link is the id of a link on the email
Reports
- Opens and Clicks by Newsletter by Department
- Open and Clicks by Department
- Opens and Clicks for the entire Newsletter
- Clicks by Link by Newsletter (just assume we can map the link id to the link)
How would you acchitect the partition key? Would having different document types that relate to the reports? ie, one that tracks employee clicks/opens (this would be a large amount of data), one that increments aggregates by department, an aggregate for the newsletter (maybe just sum the department) , etc or will this transaction get expensive to implement as you might get hit with 10000 opens about the same time as an email is read?
Having read about Hot partitions, the approach above would seem to fall into that category.