0
votes

I need help.

I want to store articles from a lot of feeds in Azure Table Storage, and I'm expecting somewhere around 100 millions rows there. Initially I thought that Azure Table Storage will fit my requirements since I can design it like this:

  • PartitionKey (will be hash of feed url)
  • RowKey (will be hash of article url)
  • Data (JSON data of article)
  • PublishedOn (DateTime when article was published)

Than retrieving one article will be really fast when I'm accessing it by PartitionKey and RowKey.

And that worked as expected.

Now, I'm trying to send list of PartitionKeys (hashed feed urls) + pagination parameters (pageSize + currentPage). My result should be that in the first page of results I get recent articles, so it should be somehow ordered by PublishedOn column.

On above implementation I would need to get all rows from partitions requested, put them in one list, order them, take these which should be returned and return them...

Is this even possible to accomplish with Azure Table Storage or should I move on Azure SQL? Could I expect better performance for such query there on 100 milions records?

Thanks,

2

2 Answers

1
votes

In your current design:

  • PartitionKey (will be hash of feed url)
  • RowKey (will be hash of article url)

It's not feasible to get latest articles page by page.To support that, you need to alter your design. For details, please reference Log Tail Pattern. Log Tail Pattern suggests to leverage RowKey to store log time (PublishedOn in your case), but I assume you still want to efficiently query an article via feed URL and article URL; if my assumptions is correct, please consider Inter-Partition Secondary Index Pattern. You can leverage both of the patterns above in your design.

However, there is still a concern: my proposal is only applicable to efficiently fetch latest articles page by page in exactly one Partition Key. For now I can't think of a good design to fetch latest articles in multiple specified Partition Keys, and I'm looking forward to be enlightened by other talented guys. :)

Regarding the pagination, Azure Table achieves that by $top, x-ms-continuation-NextPartitionKey, and x-ms-continuation-NextRowKey. If you're using Azure Storage .NET Client Library, please leverage TableQuery.TakeCount and TableResultSegment.ContinuationToken.

BTW, please note that the maximum size of an Azure Table entity is 1MB, if your article may exceeds that limitation, please store your articles in Azure Blob and just save the blob link in Azure Table.

0
votes

It sounds like Azure Tables will be able to support your requirements, but you should make sure you understand the best practices when working with Azure Storage Tables to get optimal performance.

If you think your scenario might be better suited for Azure SQL check out this article to help you distinguish when using Azure SQL or Azure Storage Tables is best for your scenario.

As a side note, if you design your app well you should be able to use the Top N support provided for table queries to limit the amount of data sent to your clients.