1
votes

I'm fairly new to NoSQL type databases, including Azure's DocumentDB. I've read through the documentation and understand the basics.

The documentation left me with some questions about data modeling, particularly in how it relates to pricing.

Microsoft charges fees on a "per collection" basis, with a collection being a list of JSON objects with no particular schema, if I understand it correctly.

Now, since there is no requirement for a uniform schema, is the expectation that your "collection" is analogous to a "database" in that the collection itself might contain different types of objects? Or is the expectation that each "collection" is analogous to a "table" in that it contains only objects of similar type (allowing for variance in the object properties, perhaps).

Does query performance dictate one way or another here?

Thanks for any insight!

2

2 Answers

3
votes

The normal pattern under DocumentDB is to store lots of different types of objects in the same "collection". You distinguish them by either have a field type = "MyType" or with isMyType = true. The latter allows for subclassing and mixin behavior.

As for performance, DocumentDB gives you guaranteed 10ms read/15ms write latency for your chosen throughput. For your production system, put everything in one big "partitioned collection" and slide the size and throughput levers over time as your space needs and load demands. You'll get essentially infinite scalability and DocumentDB will take care of allocating (and deallocating) resources (secondaries, partitions, etc.) as you increase (or decrease) your throughput and size levers.

3
votes

A collection is analogous to a database, more so than a relational table. Normally, you would store a type property within documents to distinguish between types, and add the AND type='MyType' filter to each of your queries if restricting to a certain type.

Query performance will not be significantly different if you store different types of documents within the same collection vs. different collections because you're just adding another filter against an indexed property (type). You might however benefit from pooling throughput into a single collection, vs. spreading small amounts of throughput for each type/collection.