For Multiple Azure Table Storage Accounts Achieve, how to achieve over 20,000+ requests per second for insert and query?

Question

I am wondering how to achieve over 20,0000 requests per second on Azure storage accounts. I understand that data needs to be separated into multiple storage accounts in order to surpass those limits however I am unable to achieve this with current code. I have achieved around 20,000 requests per second with an individual account but my performance doesn't improve (usually decreases) when adding multiple storage accounts.

Some info on how data is stored in the storage accounts and code in background:

1.there is one table per storage account

2.each table is partitioned by the first three hash of their id(I have played around with higher and lower)

3.each partition contains about 15 records

4.each storage account contains the exact same data (this is for test purposes)

5.there are currently 3 storage accounts

(50,000 records takes 2 minutes and 16 seconds to retrieve)

7.Im also using servicepointmanager with 100 default connection and naggle off.

Here is some sample code for a large query

     public void retrievePartitionList<T1>(List<T1> entityList)
where T1 : ITableEntity, new()
    {
        int queryCountMax = 100; //Needed at 100 not to exceed uri Limits
        var partitionGroup = entityList.GroupBy(m => m.PartitionKey);
        List<TableQuery<T1>> queryList = new List<TableQuery<T1>>();
        List<Task<TableQuerySegment<T1>>> taskList = new List<Task<TableQuerySegment<T1>>>();
        //I have three storage accounts Im retrieving from. Ideally want 20k+ throughput for each storage account added
        var cloudTable2 = getTableClientForTenant(2);
        var cloudTable3 = getTableClientForTenant(3);
        var tenTenTable2 = cloudTable2.GetTableReference(BATableStorageContainerPrefixes.tableStorage + 1.ToString());
        var tenTable3 = cloudTable3.GetTableReference(BATableStorageContainerPrefixes.tableStorage + 1.ToString());
        foreach (var partition in partitionGroup)
        {
            string rowFilters = "";
            var partitionList = partition.ToList();
            var partitionFilter = TableQuery.GenerateFilterCondition(TableConstants.PartitionKey, QueryComparisons.Equal, partition.Key);
            for (var i = 0; i < partitionList.Count; i++)
            {
                var item = partitionList[i];
                if (string.IsNullOrEmpty(rowFilters))
                {
                    rowFilters = TableQuery.GenerateFilterCondition(TableConstants.RowKey, QueryComparisons.Equal, item.RowKey);
                }
                else
                {
                    var newFilter = "(" + TableQuery.GenerateFilterCondition(TableConstants.RowKey, QueryComparisons.Equal, item.RowKey) + ")";
                    rowFilters += " or " + newFilter;
                }
                if ((i + 1) % queryCountMax == 0 || i == partitionList.Count - 1)
                {
                    rowFilters = TableQuery.CombineFilters(partitionFilter, TableOperators.And, rowFilters);
                    TableQuery<T1> innerQuery = new TableQuery<T1>().Where(rowFilters);
                    innerQuery.TakeCount = TableConstants.TableServiceMaxResults;
                    queryList.Add(innerQuery);
                    var random = new Random();
                    //Randomly seperate task to different storage accounts
                    //Once again, each storage account contains the same complete data set so no matter where they go they should return the correct results
                    var tenantTask = tenantTable.ExecuteQuerySegmentedAsync(innerQuery, null);
                    var randomNum = random.Next(100);
                    if (randomNum < 33)
                    {
                        tenantTask = tenTenTable2.ExecuteQuerySegmentedAsync(innerQuery, null);
                        //Debug.WriteLine("second tenant");
                    }
                    else if(randomNum < 66)
                    {
                        //Debug.WriteLine("first tenant");
                        tenantTask = tenTable3.ExecuteQuerySegmentedAsync(innerQuery, null);
                    }
                    taskList.Add(tenantTask);
                    rowFilters = "";
                }
            }
        }
        List<T1> finalResults = new List<T1>();
        //I have messed around with parallelism and 8 is usually the best for the machine I'm on
        Parallel.ForEach(taskList, new ParallelOptions { MaxDegreeOfParallelism = 8 }, task =>
        {
            var results = Task.WhenAll(task).Result;
            lock (finalResults)
            {
                foreach (var item in results)
                {
                    finalResults.AddRange(item);
                }
            }
        });
       Debug.WriteLine(finalResults.Count()); //Just to show count of results received
    }

So what I'm looking for is something that will add about 20,000 request throughput for each storage account added. I have tried running the on a S2 azure web app with 10 instances but came back with poor results. about 2 minutes and 16 seconds fro 50,000 records When knowing all the partition keys and rowids.

EDIT To further explain the situation: The table entity being inserted is rather small. It only has a:

rowid is int

partition key is 3 char hash of the int

one property that is always the same a 10 digit int

SumanthMarigowda-MSFT SumanthMarigowda-MSFT · Accepted Answer · 2018-02-14T12:22:02

1) You have not specified what the rest of the architecture is (# of clients, connection between the clients and the storage accounts). It’s highly likely you are running into other bottlenecks. 2) You have not specified a reason or a target for what are you trying to accomplish. Are you trying to get to 21K RPS or 2 million RPS? 3) Multiple storage accounts can end up on the same storage cluster, and eventually the storage clusters(s) involved will top out (but I expect the issue is more likely #1, a single storage cluster has around a thousand nodes, probably may not be using a thousand clients).

For Multiple Azure Table Storage Accounts Achieve, how to achieve over 20,000+ requests per second for insert and query?

1 Answers