Most effective way to manage multiple tenant storage in Azure?

Question

We're creating a multi-tenant application that must segregate data between tenants. Each tenant will save various documents, each of which can fall into several different document categories. We plan to use Azure blob storage for these documents. However, given our user base and the number of documents and size of each one, we're not sure how to best manage storage accounts with our current Azure subscription.

Here are some numbers to consider. With 5,000 users at 27,000 8Mb documents per year per user, that is 1080TB per year total. A storage container maxes out at 500TB per storage account.

So my question is what would be the most efficient and cost effective way to store this data and stay within the Azure limits?

Here are a few things we've considered:

Create a storage account for each client. THIS DOES NOT WORK because you can only have 100 storage accounts per subscription (this would have been the most ideal solution).
Create a blob container for each client. A storage account can have up to 500TB, so this could potentially work except for eventually we would have to split off into other storage accounts. I'm not sure how that would work if eventually a user had data in two accounts. Could get messy.

Perhaps we are missing something fundamentally simple here.

UPDATE For now our thought is to use Azure table storage with a table for each document type. Within each table the partition key would be the tenant's ID, and the row key would be the document ID. Each row would also contain metadata type information for the document, along with a URI (or something) linking to the blob itself.

Will you be storing the client/files relationship in some kind of table? For example, a master table which would store the list of all files for all clients? — Gaurav Mantri
@Gaurav Mantri: Great question! I have provided an update to address your question. — spoof3r

Gaurav Mantri Gaurav Mantri · Accepted Answer · 2015-03-22T05:37:37

Not really an answer but think of it as "food for thought" :). Basically your architecture should be based on the fact that each storage account has some scalability targets and your design should be such that you don't exceed those to maintain high availability of storage for your application.

Some recommendations:

Start by creating multiple storage accounts (say 10 to begin with). Let's call them Pods.
Each tenant will get one of the pod. You can pick a pod storage account randomly or use some predefined logic. The information about the pod is stored along side tenant information.
From the description it seems that currently you're storing the file information in just one table. This would put a lot of stress on just one table/storage account which is not a scalable design IMHO. Instead when a tenant is created, you assign a pod to the tenant and then create a table for each tenant which will store the file information in that table. This would have following benefits: 1) You have nicely isolated each tenant data, 2) The read requests are now load-balanced thus allowing you to stay within scalability targets and 3) Since each tenant data lies in a separate table, your PartitionKey became free and you can assign some other value if needed.

Now coming on to storing files:

Again you can go with the Pod concept wherein files for each tenant reside in the pod storage account for that tenant.
If you see issues with this approach, you can randomly pick the pod storage account and put the file there and store the blob URL in the Files table.
You could either go with just one blob container (say tenant-files) or separate blob containers for each tenant.
With just one blob container for all tenants, management overhead is smaller as you just have to create this container when a new pod is commissioned. However the downside is that you can't logically separate files by tenant so if you want to provide direct access to the files (using Shared Access Signature), it would be problematic.
With separate blob containers for each tenant, the management overhead is more but you get nice logical isolation. In this as a tenant is brought on board, you would have to create container for that tenant in each pod storage account. Similarly when a new pod is commissioned, you have to ensure that a blob container is created for each tenant in the system.

Hope this gives you some idea about how you can go about architecting your solution. We're using some of these concepts in our solution (which explicitly uses Azure Storage as data store). It would be really interesting to see what architecture you come up with.

Most effective way to manage multiple tenant storage in Azure?

2 Answers

Azure Blob storage

Extension Points

Blob Storage Api Broker

Blob Storage Metadata

Azure Search

Snapshots