I am wondering how to achieve over 20,0000 requests per second on Azure storage accounts. I understand that data needs to be separated into multiple storage accounts in order to surpass those limits however I am unable to achieve this with current code. I have achieved around 20,000 requests per second with an individual account but my performance doesn't improve (usually decreases) when adding multiple storage accounts.
Some info on how data is stored in the storage accounts and code in background:
1.there is one table per storage account
2.each table is partitioned by the first three hash of their id(I have played around with higher and lower)
3.each partition contains about 15 records
4.each storage account contains the exact same data (this is for test purposes)
5.there are currently 3 storage accounts
- (50,000 records takes 2 minutes and 16 seconds to retrieve)
7.Im also using servicepointmanager with 100 default connection and naggle off.
Here is some sample code for a large query
public void retrievePartitionList<T1>(List<T1> entityList)
where T1 : ITableEntity, new()
{
int queryCountMax = 100; //Needed at 100 not to exceed uri Limits
var partitionGroup = entityList.GroupBy(m => m.PartitionKey);
List<TableQuery<T1>> queryList = new List<TableQuery<T1>>();
List<Task<TableQuerySegment<T1>>> taskList = new List<Task<TableQuerySegment<T1>>>();
//I have three storage accounts Im retrieving from. Ideally want 20k+ throughput for each storage account added
var cloudTable2 = getTableClientForTenant(2);
var cloudTable3 = getTableClientForTenant(3);
var tenTenTable2 = cloudTable2.GetTableReference(BATableStorageContainerPrefixes.tableStorage + 1.ToString());
var tenTable3 = cloudTable3.GetTableReference(BATableStorageContainerPrefixes.tableStorage + 1.ToString());
foreach (var partition in partitionGroup)
{
string rowFilters = "";
var partitionList = partition.ToList();
var partitionFilter = TableQuery.GenerateFilterCondition(TableConstants.PartitionKey, QueryComparisons.Equal, partition.Key);
for (var i = 0; i < partitionList.Count; i++)
{
var item = partitionList[i];
if (string.IsNullOrEmpty(rowFilters))
{
rowFilters = TableQuery.GenerateFilterCondition(TableConstants.RowKey, QueryComparisons.Equal, item.RowKey);
}
else
{
var newFilter = "(" + TableQuery.GenerateFilterCondition(TableConstants.RowKey, QueryComparisons.Equal, item.RowKey) + ")";
rowFilters += " or " + newFilter;
}
if ((i + 1) % queryCountMax == 0 || i == partitionList.Count - 1)
{
rowFilters = TableQuery.CombineFilters(partitionFilter, TableOperators.And, rowFilters);
TableQuery<T1> innerQuery = new TableQuery<T1>().Where(rowFilters);
innerQuery.TakeCount = TableConstants.TableServiceMaxResults;
queryList.Add(innerQuery);
var random = new Random();
//Randomly seperate task to different storage accounts
//Once again, each storage account contains the same complete data set so no matter where they go they should return the correct results
var tenantTask = tenantTable.ExecuteQuerySegmentedAsync(innerQuery, null);
var randomNum = random.Next(100);
if (randomNum < 33)
{
tenantTask = tenTenTable2.ExecuteQuerySegmentedAsync(innerQuery, null);
//Debug.WriteLine("second tenant");
}
else if(randomNum < 66)
{
//Debug.WriteLine("first tenant");
tenantTask = tenTable3.ExecuteQuerySegmentedAsync(innerQuery, null);
}
taskList.Add(tenantTask);
rowFilters = "";
}
}
}
List<T1> finalResults = new List<T1>();
//I have messed around with parallelism and 8 is usually the best for the machine I'm on
Parallel.ForEach(taskList, new ParallelOptions { MaxDegreeOfParallelism = 8 }, task =>
{
var results = Task.WhenAll(task).Result;
lock (finalResults)
{
foreach (var item in results)
{
finalResults.AddRange(item);
}
}
});
Debug.WriteLine(finalResults.Count()); //Just to show count of results received
}
So what I'm looking for is something that will add about 20,000 request throughput for each storage account added. I have tried running the on a S2 azure web app with 10 instances but came back with poor results. about 2 minutes and 16 seconds fro 50,000 records When knowing all the partition keys and rowids.
EDIT To further explain the situation: The table entity being inserted is rather small. It only has a:
rowid is int
partition key is 3 char hash of the int
one property that is always the same a 10 digit int