9
votes

I am relatively new to Azure storage and have been implementing a solution for some time now. And I keep hitting obstacles, making me feel that I'm not applying the right storage type for the data I'm storing.

So this is more of an overall question:

  • When should I use Azure SQL?
  • When should I use Azure Table storage?
  • When should I use Azure Blobs?

So far I have been using table storage a lot, and I'm now paying for it. As requirements for the solution grow I find myself unable to access the data as needed.

For instance I need to fetch the 50 latest entries in a table, but I can not user OrderBy in the query. I need to fetch the total amount of entries, but can not use Count.

I keep getting the impression that any data I plan to access regularly without knowing the exact RowKey and PartitionKey should be indexed in Azure SQL aswell as being stored in a table. Is this correct?

I also find myself recreating objects as Entity objects, but with the very severe limitations on datatypes I often end up just serializing the object into a byte array. And though a table row may hold up to 1MB a byte array on that row may only hold 64KB, at which point I end up using Blob storage instead.

So in the end I feel like I would have been better off just putting all my data in Azure SQL and indexing larger data but saving it as blobs. Of course this does not feel quite right, since that would leave Table storage with no real purpose.

So I'm wondering if there are any guidelines for when to use which kind of storage.

In my case I have very large amount of data in some areas, some of it consumes a fair amount of space (often above 64KB), but I also need to access the data very frequently and will need to be able to filter and sort it by certain values.

  • Do I really need to index all data I plan to access in SQL?
  • And would I be better off avoiding Table for any data that could potentially exceed 64KB?

I feel like there's something I'm not doing right. Something I didn't understand. What am I missing here?

3

3 Answers

5
votes

The best recommendation I can make is basically, "Try really hard not to use Azure Table Storage". As other folks have pointed out, it's not just a "No-SQL" data-store, it's a particularly stunted, handicapped, and very-low-featured instance of a No-SQL store. About the only thing good about it is that you can put lots and lots of data into it very quickly, and with minimal storage fees. However, you basically can't hope to get that data back out again unless you're lucky enough to have a use-case that magically matches its Partition-Key/Row-Key storage model. If you don't - and I suspect very few people do - you're going to be doing a lot of partition scans, and processing the data yourself.

Beyond that, Azure Table Storage seems to be at a dead-end in terms of development. If you look at the "Support Secondary Indexes" request on the Azure feedback forums (https://feedback.azure.com/forums/217298-storage/suggestions/396314-support-secondary-indexes), you can see that support for Secondary Indexes was promised as far back as 2011, but no progress has been made. Nor has any progress been made on any of the other top requests for Table Storage.

Now, I know that Scott Guthrie is a quality guy, so my hope is that all this stagnation on the Table Storage front is a preface to Azure fixing it and coming up with something really cool. That's my hope (though I have zero evidence that's the case). But for right now, unless you don't have a choice, I'd strongly recommend against Azure Table Storage. Use Azure SQL; use your own instance of MongoDB or some other No-SQL DB; or use Amazon DynamoDB. But don't use Azure Table Storage.

EDIT: 2014-10-09 - Having been forced into a scenario where I needed to use it, I've modified my opinion on Azure Table Storage slightly. It does in fact have all the regrettable limitations I ascribe to it above, but it also has its (limited) uses. I go into them somewhat on a blog post here.

EDIT: 2017-02-09 - Nah, ATS is still awful. Steer clear of it. It hasn't improved significantly in 7+ years, and MS obviously wishes it would just go away. And it probably should - they're presumably only keeping it around for folks who made the mistake of betting on it originally.

1
votes

have a look at this: Windows Azure Table Storage and Windows Azure SQL Database - Compared and Contrasted

doesn't include blobs, but a good read anyway...

1
votes

I keep getting the impression that any data I plan to access regularly without knowing the exact RowKey and PartitionKey should be indexed in Azure SQL aswell as being stored in a table. Is this correct?

Table storage does not support secondary indexes and so any efficient queries should contain the RowKey and the PartitionKey. There can be workarounds such as saving the same data twice in the same table with different RowKeys. However this can quickly become a pain. If eventual consistency is ok then you could do this. You need to take care of transactions and rollbacks.

In my case I have very large amount of data in some areas, some of it consumes a fair amount of space (often above 64KB), but I also need to access the data very frequently and will need to be able to filter and sort it by certain values.

Use table storage for basic NoSQL functionality and the ability to scale quickly. However, if you want secondary indexes and other such features you might have to take a look at something like DynamoDB on AWS which afaik seems to have better support for secondary indexes etc. If you have data that has complex relationships in other words data that requires an RDBMS go with SQL Azure.

Now, as far as your options on Azure go I'd think you would need to store everything on SQL Azure and large objects as blobs or on table storage.

Do I really need to index all data I plan to access in SQL?

Tough to say. If each partition is going to contain say just 100 rows then you can query by partition key and any of the columns. At this point the partition scan is going to be pretty fast. However, if you have a million rows then it could be a problem.

I feel like there's something I'm not doing right. Something I didn't understand. What am I missing here?

A bunch of early Azure users started using Table Storage without understanding what NoSQL (and in this case a particularly stunted version of NoSQL) entails.