0
votes

I would like to build a user activity feed of heterogeneous elements with DocumentDB.

I consider 3 modeling scenarios according to this link :

https://github.com/Azure/azure-content/blob/master/articles/documentdb/documentdb-modeling-data.md#when-not-to-embed

A single document per user with a nested array of feed elements

{
"userId": "1",    
"feed": [
    {"id": 1, "author": "anon", "image": "https://image.com/y.jpg"},
    {"id": 2, "author": "bob", "status": "wisdom from the interwebs"},
    …
    {"id": 100001, "author": "jane", "quote": "and on we go ..."},
    …
    {"id": 1000000001, "author": "angry", "status": "blah angry blah angry"},
    …
    {"id": ∞ + 1, "author": "bored", "xxx": "oh man, will this ever end?"}
    ]
}

Seems bad scenario because Document get some size limitations, so it's not scalable.

One document per feed element

{
"userId": "1",    
"id": 1, 
"author": "anon", 
"image": "https://image.com/y.jpg"
},
{
"id": 2,
"author": "bob", 
"status": "wisdom from the interwebs"
},...

Seems good solution, but I feel like waste the potential of DocumentDB, too much flat? Maybe not optimized.

X documents with 1 nested array of feed elements

{
"userId": 1
"feed": [
    {"id": 4, "author": "anon", "image": "https://image.com/y.jpg"},
    {"id": 5, "author": "bob", "status": "tails from the field"},
    ...
    {"id": 99, "author": "angry", "status": "blah angry blah angry"}
]
},
{
"userId": 1
"feed": [
    {"id": 100, "author": "anon", "status": "yet more"},
    ...
    {"id": 199, "author": "bored", "xxx": "will this ever end?"}
]
}

Seems like best solution but add a lot of complexity into code (handle delete operations and pagination, handle WHERE clause with different feed types...). I feel like adding functional part (pagination) into storage architecture. Less flexible.

Obviously, scenario 1 is not an option. What do you think of scenarios 2 and 3 ?

1
This question is probably off topic for SO since it's primarily opinion based.Larry Maccherone
As @LarryMaccherone pointed out, this is off-topic. That said: You've identified a legitimate unbounded-array scenario. How you deal with that is really not a simple answer (and you'll likely get multiple responses with multiple ways to approach it). But you've already come up with some ideas on your own, and it's worth drilling down into how you write and read data, how you might be impacted performance-wise, etc.David Makogon
I don't think it's opinion based cause : it's a really common scenario and maybe some people have some guidelines about it + DocumentDB is cloud-based so we don't know how to optimize it. Maybe some ways are better than other ?Raph

1 Answers

1
votes

Based on my experience keeping it simple pays off with DocumentDb (considering the current limitations). Scenario 2 allows for the most straightforward way to manage CRUD operations, and keep the code simple. It seems like that with the third approach you will need to constantly check the size a document has accumulated, which is also cumbersome.

That being said, if bulk insert is to be used, keep in mind that for it you will also need to partition the number of records sent in a batch(even with Scenario 2) as stored proc calls are limited to 512 kb as well.

From my experience hietarchical structures are powerful in DocumentDb, as long as they can describe a piece of information to be looked at as a whole. If you need to look at nested parts of different documents - joins and stored procedures can help with that.

Hope this is useful despite any personal opinion in this post :)