2
votes

I'm looking into moving to the new partitioned collections for DocumentDB and have a few questions that the documentation and pricing calculator seems to be a little unclear on.

PRICING:

In the below scenario my partitioned collection will be charged $30.02/mo at 1GB of data with a constant hourly RU use of 500:

enter image description here

So does this mean if my users only hit the data for an average of 500 RU's for about 12 hours per day which means that HALF the time my collection goes UNUSED, but is still RUNNING and AVAILABLE (not shut down) the price goes down to $15.13/mo as the calculator indicates here:

enter image description here

Or will I be billed the full $30.01/mo since my collection was up and running?

I get confused when I go to the portal and see an estimate for $606/mo with no details behind it when I attempt to spin up the lowest options on a partition collection:

enter image description here

Is the portal just indicating the MAXIMUM that I COULD be billed that month if I use all my allotted 10,100 RU's a second every second of the hour for 744 consecutive hours?

If billing is based on hourly use and the average RU's used goes down to 100 on some of the hours used in the second scenario does the cost go down even further? Does Azure billing for partitioned collections fluctuates based on hourly usage and not total up time like the existing S1/S2/S3 tiers?

If so then how does the system determine what is billed for that hour? If most of the hour the RU's used are 100/sec but for a few seconds that hour it spikes to 1,000 does it average out by the seconds across that entire hour and only charge me for something like 200-300 RU's for that hour or will I be billed for the highest RU's used that hour?

PERFORMANCE:

Will I see a performance hit by moving to this scenario since my data will be on separate partitions and require partition id/key to access? If so what can I expect, or will it be so minimal that it would be undetected by my users?

RETRIES & FAULT HANDLING:

I'm assuming the TransientFaultHandling Nuget package I use in my current scenario will still work on the new scenario, but may not be used as much since my RU capacity is much larger, or do I need to rethink how I handle requests that go over the RU cap?

2
RU (throughput) is reserved for your db based on the limit you set (if uisng the user-defined performance option), else it is set at whatever S plan you selected. But you will get bill per hour whether you make use of it all not. - Plac3Hold3r
If that is the case then the pricing calculator is TERRIBLY misleading. This is where my confusion lies. If in fact you MUST be billed the minimum 10,100 RU units per hour on the user-defined performance option why would the pricing calculator allow you to see what the cost would be with that option for only using 500 RU's as my screenshots show above? Why wouldn't they lock the values to reflect that the minimum is 10,100 RU's? - INNVTV

2 Answers

1
votes

So they way that pricing works for an Azure documentDB is that you pay to reserve a certain amount of data storage size (in GB's) and/or throughput (in Request units (RU)). These charges are charged per hour that the reserve is in place (does not require usage). Additionally, just having a Document Account active is deemed to be an active S1 subscription, until a documentDB gets created then the pricing of your db takes over. There are two options available:

Option 1 (Original Pricing)

You can a choose between S1, S2 or S3. Each offering the same 10GB of storage but varying in throughput 250RU/1000RU/2500RU.

Option 2 (User-defined performance)

This is the new pricing structure which better decouples size and throughout. This option additionally provides for partitioning. Note that with user defined performance you are charge per GB of data storage used (Pay as you go storage).

With user-defined performance levels, storage is metered based on consumption, but with pre-defined performance levels, 10 GB of storage is reserved at the time of collection creation.

Single Partition Collection

The minimum is set at 400RU and 1GB of data storage.

The maximum is set at 10,000RU and 250GB of data storage.

Partitioned Collections

The minimum is set at 10,000RU and 1GB of data storage.

The maximum is set at 250,000RU and 250GB of data storage (EDIT can request greater).

So at a minimum you will be paying the cost per hour related to the option you selected. The only way to not pay for an hour would be to delete the db and the account, unfortunately.

Cost of Varying RU

In terms of varying your RU within the time frame of 1 hours, you are charged for that hour at the cost of the peak reserve RU you requested. So if you were at 400RU and you up it to 1000RU for 1sec you will be charge at the 1000RU rate for that hour. Even if for the other 59minutes 59secounds you set it back to 400RU.

0
votes

Will I see a performance hit by moving to this scenario since my data will be on separate partitions and require partition id/key to access?

One the topic of perfromance hit there's a few things to think about, but in general no.

If you have a sane partition key with enough values you should not see a performance penalty. This means that you need to partition data so that you have the partition key available when querying and you need to keep the data you want from a query in the same partition by using the same partiton key.

If you do queries without partitionkey, you will see a sever penalty, as the query is parsed and executed per partition.

One thing to keep in mind when selecting a partition key is the limits for each partition, which are 10GB and 10K RU. This means that you want an even distributions over the partitions in order to avoid a "hot" partition which means that even if you scale to more than enough RU in total, you may recieve 429 for a specific partition.